stash/pkg/scraper/mapped.go

1025 lines
26 KiB
Go
Raw Normal View History

package scraper
import (
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
"context"
"errors"
"fmt"
"math"
"reflect"
"regexp"
"strconv"
"strings"
"time"
"github.com/stashapp/stash/pkg/logger"
"github.com/stashapp/stash/pkg/models"
"github.com/stashapp/stash/pkg/sliceutil/stringslice"
"gopkg.in/yaml.v2"
)
type mappedQuery interface {
Refactor scraper top half (#1893) * Simplify scraper listing Introduce an enum, scraper.Kind, which explains what we are looking for. Make it possible to match this from a scraper struct. Use the enum to rewrite all the listing code to use the same code path. * Use a map, nitpick ScrapePerformerList Let the cache store a map from ID of a scraper to the scraper. This improves lookups when there are many scrapers, making it practically O(1) rather than O(n). If many scrapers are stored, this is faster. Since range expressions work unchanged, we don't have to change much, and things will still work. make Kind a Stringer Rename ScraperPerformerList -> ScraperPerformerQuery since that name is used in the other scrapers, and we value consistency. Tune ScraperPerformerQuery: * Return static errors * Use the new functionality * When loading scrapers, do so directly Rather than first walking the directory structure to obtain file paths, fold the load directly in the the filepath walk. This makes the code for more direct. * Use static ErrNotFound If a scraper isn't found, return one static error. This paves the way for eventually doing our own error-presenter in gqlgen. * Store the cache in the Resolver state Putting the scraperCache directly in the resolver avoids the need to call manager.GetInstance() all over the place to get access to the scraper cache. The cache is stored by pointer, so it should be safe, since the cache will just update its internal state rather than being overwritten. We can now utilize the resolver state to grab the cache where needed. While here, pass context.Context from the resolver down into a function, which removes a context.TODO() * Introduce ScrapedContent Create a union in the GraphQL schema for all scraped content. This simplifies the internal implementation because we get variance on the output content type. Introduce a new type ScrapedContentType which signifies the scraped content you want as a caller. Use these to generalize the List interface and the URL scraping interface. * Simplify the scraper API Introduce a new interface for scraping. This interface is then used in the upper half of the scraper code, to make the code use one code flow rather than multiple code flows. Variance is currently at the old scraper structure. Add extending interfaces for the different ways of invoking scrapes. Use interface conversions to convert a scraper from the cache to a scraper supporting the extra methods. The return path returns models.ScrapedContent. Write a general postProcess function in the scraper, handling all ScrapedContent via type switching. This consolidates all postprocessing code flows. Introduce marhsallers in the resolver code for converting ScrapedContent into the underlying concrete types. Use this to plug the existing fields in the Query resolver, so everything still works. * ScrapedContent: add more marshalling functions Handle all marshalling of ScrapedContent through marhsalling functions. Removes some hand-rolled early variants of it, and replaces it with a canonical code flow. * Support loadByName via scraper_s In order to temporarily plug a hole in the current implementation, we use the older implementation as a hook to get the newer implementation to run. Later on, this can serve as a guide for how to implement the lower level bits inside the scrapers themselves. For now, it just enables support. * Plug the remaining scraper functions for now Since we would like to have a scraper which works in between refactors, plug the lower level parts of the scraper for now. It avoids us having to tackle this part just yet. * Move postprocessing to its own file There's enough postprocessing to clutter the main scrapers.go file. Move all of this into a new file, postprocessing to make the API simpler. It now lives in scrapers.go. * Scraper: Invoke API consistency scraper.Cache.ScrapeByName -> ScrapeName * Fix scraping scenes by URL Simple typo. While here, also make a single marshaller nil-aware. * Introduce scraper groups, consolidate loadByURL Rename `scraper_s` into `group`. A group is a group of scrapers with the same identity. This corresponds to a single YAML file for a scraper configuration. It defines a group which supports different types of scraping contexts. Move config into the group, and lift txnManager and globalConfig to the group. Because we now return models.ScrapedContent we can use interfaces to get variance from the different underlying scrapers. Use a type switch for the URL matcher candidates. And then again for the scrapers. This consolidates all URL scraping paths into one. While here, remove the urlMatcher interface which isn't needed. Also clean up the remaining interfaces for url scraping and delete code which has no purpose anymore. * Consolidate fragment scraping in one code path While here, abide the linters checks. * Refactor loadByFragment Give it the same treatment as loadByURL: Step 1: find a scraperActionImpl which works for the data. Step 2: use that to scrape Most of this is simple analysis on the data at hand. It can be pushed down further in a later commit, but for now we leave it here. * Remove configScraper, autotag is a scraper Remove the remains of the configScraper struct. It now lives on in the group struct. Kill the remaining interfaces from the old implementation while here. Remove group.specification since it can now be handled by a simple func call to spec(). Work through the autotag scraper. It now implements the scraper interface, so it can be used as a scraper. This also simplifies the autotag scraper quite a bit since it doens't have to implement a number of unsupported func calls. * Simplify the fragment scraper flow * Pass the context Eliminate a round of context.TODO() in the scraper code by passing the calling context down into the subsystem. This will gracefully allow for termination of remote calls if the client goes away for some reason in GraphQL requests. * Improve listScrapers in the schema Support lists of types we accept. * Be graceful on nil values in conversion Supporting nil-values make the API more robust in the case of partial results in a multi-scrape situation. * Improve listScrapers: output at-most-once Use the ID of a scraper to reduce the output set. If a scraper has been included, don't include it again. * Consolidate all API level errors into resolver.go * Reorder files and functions: scrapers.go -> cache.go: It almost contains nothing but the cache code. Move errors into scraper.go from here because It is a better place to have them living right now group.go: All of the group structure. This can now go from scraper.go, making it more lean. Move group create from config_scraper to here. config.go: Move the `(c config) spec()` call to here. config_scraper.go: Empty file by now * Name-update the scraper interfaces Use 'via' rather than 'loadBy'. The scrape happens via a given scrape method, so I think this is a nice name for it. * Rename scrapers for consistency. While here, improve the error formatting, so different errors come back differently. * Nuke the freeones field from the GraphQL schema * Fix autotag interfacing, refactor The autotag scraper uses a pointer receiver, but the rest of the code we use for scraping doesn't expect a pointer-receiver. Hence, to fix the autotag scraper, we change it to be a value receiver, like the rest of the code. Fix: viaScene, and viaGallery. While here, remove a couple of pointer-receiver methods which can be trivially rewritten into plain functions. * Protect against pointer interfaces The underlying code can be a bit inconsistent in what it returns. Introduce pointer-types in the postprocessing layer and handle them accordingly for now. Once a better understanding of the lower levels are understood, we can lift this. * Move ErrConversion into the models package. The conversion error pertains to the logic of converting models. Because of this, it should move there, so it is centralized. * Be consistent in scraper resolver error handling If we have a static error Err = errors.New(..) Then use it wrapped at the start: fmt.Errorf("%w: ...context...", Err) This reads better. While here, avoid using the underlying Atoi errors: they are verbose, and like 99% of the time, the user know what is wrong from the input string, so just give that back. Also, remove the scraper id from the error contexts: it is implicit, and the error wouldn't change if we used a different scraper, which the error message would imply. * Mark the list*Scrapers() API as deprecated The same functionality is now present in listScrapers. * Improve error formatting Think about how each error is going to be used and tweak them to be nicer. * Return a sorted list of scrapers This helps testing, it's closer to what we had, caches like stable data, and it is easier for humans. It also makes the output stable, because map iteration is randomized. * Fix listScrapers calls to return in ID-order Since we need the ordering to be by ID in all situations, it is easier to just generalize the cache listScrapers call to support multiple scraper types. This avoids a de-dupe map up the chain, since every scraper is only considered once. Sorting now happens in the cache listScrapers call. Use this generalized function in all resolvers, which are now simple passthroughs. * Remove UpdateConfig from the scraper cache. This isn't needed, so get rid of it. * Pull a context into identify Scraping scenes in the identify tasks now use a context from up the call chain. * Do not store the scraper cache in the resolver. Scraper caches are updated through manager.singleton•RefreshScraperCache, so we can't keep a pointer to it in the resolver. Instead, solve this by adding a fetcher method to the resolver type. This keeps it local to the resolver, while handling the problem of updating caches in the configuration.
2021-11-18 23:55:34 +00:00
runQuery(selector string) ([]string, error)
getType() QueryType
setType(QueryType)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
subScrape(ctx context.Context, value string) mappedQuery
}
type commonMappedConfig map[string]string
type mappedConfig map[string]mappedScraperAttrConfig
func (s mappedConfig) applyCommon(c commonMappedConfig, src string) string {
if c == nil {
return src
}
ret := src
for commonKey, commonVal := range c {
Enable gocritic (#1848) * Don't capitalize local variables ValidCodecs -> validCodecs * Capitalize deprecation markers A deprecated marker should be capitalized. * Use re.MustCompile for static regexes If the regex fails to compile, it's a programmer error, and should be treated as such. The regex is entirely static. * Simplify else-if constructions Rewrite else { if cond {}} to else if cond {} * Use a switch statement to analyze formats Break an if-else chain. While here, simplify code flow. Also introduce a proper static error for unsupported image formats, paving the way for being able to check against the error. * Rewrite ifElse chains into switch statements The "Effective Go" https://golang.org/doc/effective_go#switch document mentions it is more idiomatic to write if-else chains as switches when it is possible. Find all the plain rewrite occurrences in the code base and rewrite. In some cases, the if-else chains are replaced by a switch scrutinizer. That is, the code sequence if x == 1 { .. } else if x == 2 { .. } else if x == 3 { ... } can be rewritten into switch x { case 1: .. case 2: .. case 3: .. } which is clearer for the compiler: it can decide if the switch is better served by a jump-table then a branch-chain. * Rewrite switches, introduce static errors Introduce two new static errors: * `ErrNotImplmented` * `ErrNotSupported` And use these rather than forming new generative errors whenever the code is called. Code can now test on the errors (since they are static and the pointers to them wont change). Also rewrite ifElse chains into switches in this part of the code base. * Introduce a StashBoxError in configuration Since all stashbox errors are the same, treat them as such in the code base. While here, rewrite an ifElse chain. In the future, it might be beneifical to refactor configuration errors into one error which can handle missing fields, which context the error occurs in and so on. But for now, try to get an overview of the error categories by hoisting them into static errors. * Get rid of an else-block in transaction handling If we succesfully `recover()`, we then always `panic()`. This means the rest of the code is not reachable, so we can avoid having an else-block here. It also solves an ifElse-chain style check in the code base. * Use strings.ReplaceAll Rewrite strings.Replace(s, o, n, -1) into strings.ReplaceAll(s, o, n) To make it consistent and clear that we are doing an all-replace in the string rather than replacing parts of it. It's more of a nitpick since there are no implementation differences: the stdlib implementation is just to supply -1. * Rewrite via gocritic's assignOp Statements of the form x = x + e is rewritten into x += e where applicable. * Formatting * Review comments handled Stash-box is a proper noun. Rewrite a switch into an if-chain which returns on the first error encountered. * Use context.TODO() over context.Background() Patch in the same vein as everything else: use the TODO() marker so we can search for it later and link it into the context tree/tentacle once it reaches down to this level in the code base. * Tell the linter to ignore a section in manager_tasks.go The section is less readable, so mark it with a nolint for now. Because the rewrite enables a ifElseChain, also mark that as nolint for now. * Use strings.ReplaceAll over strings.Replace * Apply an ifElse rewrite else { if .. { .. } } rewrite into else if { .. } * Use switch-statements over ifElseChains Rewrite chains of if-else into switch statements. Where applicable, add an early nil-guard to simplify case analysis. Also, in ScanTask's Start(..), invert the logic to outdent the whole block, and help the reader: if it's not a scene, the function flow is now far more local to the top of the function, and it's clear that the rest of the function has to do with scene management. * Enable gocritic on the code base. Disable appendAssign for now since we aren't passing that check yet. * Document the nolint additions * Document StashBoxBatchPerformerTagInput
2021-10-18 03:12:40 +00:00
ret = strings.ReplaceAll(ret, commonKey, commonVal)
}
return ret
}
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (s mappedConfig) process(ctx context.Context, q mappedQuery, common commonMappedConfig) mappedResults {
var ret mappedResults
for k, attrConfig := range s {
if attrConfig.Fixed != "" {
// TODO - not sure if this needs to set _all_ indexes for the key
const i = 0
ret = ret.setKey(i, k, attrConfig.Fixed)
} else {
selector := attrConfig.Selector
selector = s.applyCommon(common, selector)
Refactor scraper top half (#1893) * Simplify scraper listing Introduce an enum, scraper.Kind, which explains what we are looking for. Make it possible to match this from a scraper struct. Use the enum to rewrite all the listing code to use the same code path. * Use a map, nitpick ScrapePerformerList Let the cache store a map from ID of a scraper to the scraper. This improves lookups when there are many scrapers, making it practically O(1) rather than O(n). If many scrapers are stored, this is faster. Since range expressions work unchanged, we don't have to change much, and things will still work. make Kind a Stringer Rename ScraperPerformerList -> ScraperPerformerQuery since that name is used in the other scrapers, and we value consistency. Tune ScraperPerformerQuery: * Return static errors * Use the new functionality * When loading scrapers, do so directly Rather than first walking the directory structure to obtain file paths, fold the load directly in the the filepath walk. This makes the code for more direct. * Use static ErrNotFound If a scraper isn't found, return one static error. This paves the way for eventually doing our own error-presenter in gqlgen. * Store the cache in the Resolver state Putting the scraperCache directly in the resolver avoids the need to call manager.GetInstance() all over the place to get access to the scraper cache. The cache is stored by pointer, so it should be safe, since the cache will just update its internal state rather than being overwritten. We can now utilize the resolver state to grab the cache where needed. While here, pass context.Context from the resolver down into a function, which removes a context.TODO() * Introduce ScrapedContent Create a union in the GraphQL schema for all scraped content. This simplifies the internal implementation because we get variance on the output content type. Introduce a new type ScrapedContentType which signifies the scraped content you want as a caller. Use these to generalize the List interface and the URL scraping interface. * Simplify the scraper API Introduce a new interface for scraping. This interface is then used in the upper half of the scraper code, to make the code use one code flow rather than multiple code flows. Variance is currently at the old scraper structure. Add extending interfaces for the different ways of invoking scrapes. Use interface conversions to convert a scraper from the cache to a scraper supporting the extra methods. The return path returns models.ScrapedContent. Write a general postProcess function in the scraper, handling all ScrapedContent via type switching. This consolidates all postprocessing code flows. Introduce marhsallers in the resolver code for converting ScrapedContent into the underlying concrete types. Use this to plug the existing fields in the Query resolver, so everything still works. * ScrapedContent: add more marshalling functions Handle all marshalling of ScrapedContent through marhsalling functions. Removes some hand-rolled early variants of it, and replaces it with a canonical code flow. * Support loadByName via scraper_s In order to temporarily plug a hole in the current implementation, we use the older implementation as a hook to get the newer implementation to run. Later on, this can serve as a guide for how to implement the lower level bits inside the scrapers themselves. For now, it just enables support. * Plug the remaining scraper functions for now Since we would like to have a scraper which works in between refactors, plug the lower level parts of the scraper for now. It avoids us having to tackle this part just yet. * Move postprocessing to its own file There's enough postprocessing to clutter the main scrapers.go file. Move all of this into a new file, postprocessing to make the API simpler. It now lives in scrapers.go. * Scraper: Invoke API consistency scraper.Cache.ScrapeByName -> ScrapeName * Fix scraping scenes by URL Simple typo. While here, also make a single marshaller nil-aware. * Introduce scraper groups, consolidate loadByURL Rename `scraper_s` into `group`. A group is a group of scrapers with the same identity. This corresponds to a single YAML file for a scraper configuration. It defines a group which supports different types of scraping contexts. Move config into the group, and lift txnManager and globalConfig to the group. Because we now return models.ScrapedContent we can use interfaces to get variance from the different underlying scrapers. Use a type switch for the URL matcher candidates. And then again for the scrapers. This consolidates all URL scraping paths into one. While here, remove the urlMatcher interface which isn't needed. Also clean up the remaining interfaces for url scraping and delete code which has no purpose anymore. * Consolidate fragment scraping in one code path While here, abide the linters checks. * Refactor loadByFragment Give it the same treatment as loadByURL: Step 1: find a scraperActionImpl which works for the data. Step 2: use that to scrape Most of this is simple analysis on the data at hand. It can be pushed down further in a later commit, but for now we leave it here. * Remove configScraper, autotag is a scraper Remove the remains of the configScraper struct. It now lives on in the group struct. Kill the remaining interfaces from the old implementation while here. Remove group.specification since it can now be handled by a simple func call to spec(). Work through the autotag scraper. It now implements the scraper interface, so it can be used as a scraper. This also simplifies the autotag scraper quite a bit since it doens't have to implement a number of unsupported func calls. * Simplify the fragment scraper flow * Pass the context Eliminate a round of context.TODO() in the scraper code by passing the calling context down into the subsystem. This will gracefully allow for termination of remote calls if the client goes away for some reason in GraphQL requests. * Improve listScrapers in the schema Support lists of types we accept. * Be graceful on nil values in conversion Supporting nil-values make the API more robust in the case of partial results in a multi-scrape situation. * Improve listScrapers: output at-most-once Use the ID of a scraper to reduce the output set. If a scraper has been included, don't include it again. * Consolidate all API level errors into resolver.go * Reorder files and functions: scrapers.go -> cache.go: It almost contains nothing but the cache code. Move errors into scraper.go from here because It is a better place to have them living right now group.go: All of the group structure. This can now go from scraper.go, making it more lean. Move group create from config_scraper to here. config.go: Move the `(c config) spec()` call to here. config_scraper.go: Empty file by now * Name-update the scraper interfaces Use 'via' rather than 'loadBy'. The scrape happens via a given scrape method, so I think this is a nice name for it. * Rename scrapers for consistency. While here, improve the error formatting, so different errors come back differently. * Nuke the freeones field from the GraphQL schema * Fix autotag interfacing, refactor The autotag scraper uses a pointer receiver, but the rest of the code we use for scraping doesn't expect a pointer-receiver. Hence, to fix the autotag scraper, we change it to be a value receiver, like the rest of the code. Fix: viaScene, and viaGallery. While here, remove a couple of pointer-receiver methods which can be trivially rewritten into plain functions. * Protect against pointer interfaces The underlying code can be a bit inconsistent in what it returns. Introduce pointer-types in the postprocessing layer and handle them accordingly for now. Once a better understanding of the lower levels are understood, we can lift this. * Move ErrConversion into the models package. The conversion error pertains to the logic of converting models. Because of this, it should move there, so it is centralized. * Be consistent in scraper resolver error handling If we have a static error Err = errors.New(..) Then use it wrapped at the start: fmt.Errorf("%w: ...context...", Err) This reads better. While here, avoid using the underlying Atoi errors: they are verbose, and like 99% of the time, the user know what is wrong from the input string, so just give that back. Also, remove the scraper id from the error contexts: it is implicit, and the error wouldn't change if we used a different scraper, which the error message would imply. * Mark the list*Scrapers() API as deprecated The same functionality is now present in listScrapers. * Improve error formatting Think about how each error is going to be used and tweak them to be nicer. * Return a sorted list of scrapers This helps testing, it's closer to what we had, caches like stable data, and it is easier for humans. It also makes the output stable, because map iteration is randomized. * Fix listScrapers calls to return in ID-order Since we need the ordering to be by ID in all situations, it is easier to just generalize the cache listScrapers call to support multiple scraper types. This avoids a de-dupe map up the chain, since every scraper is only considered once. Sorting now happens in the cache listScrapers call. Use this generalized function in all resolvers, which are now simple passthroughs. * Remove UpdateConfig from the scraper cache. This isn't needed, so get rid of it. * Pull a context into identify Scraping scenes in the identify tasks now use a context from up the call chain. * Do not store the scraper cache in the resolver. Scraper caches are updated through manager.singleton•RefreshScraperCache, so we can't keep a pointer to it in the resolver. Instead, solve this by adding a fetcher method to the resolver type. This keeps it local to the resolver, while handling the problem of updating caches in the configuration.
2021-11-18 23:55:34 +00:00
found, err := q.runQuery(selector)
if err != nil {
logger.Warnf("key '%v': %v", k, err)
}
if len(found) > 0 {
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
result := s.postProcess(ctx, q, attrConfig, found)
for i, text := range result {
ret = ret.setKey(i, k, text)
}
}
}
}
return ret
}
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (s mappedConfig) postProcess(ctx context.Context, q mappedQuery, attrConfig mappedScraperAttrConfig, found []string) []string {
// check if we're concatenating the results into a single result
var ret []string
if attrConfig.hasConcat() {
result := attrConfig.concatenateResults(found)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
result = attrConfig.postProcess(ctx, result, q)
if attrConfig.hasSplit() {
results := attrConfig.splitString(result)
// skip cleaning when the query is used for searching
if q.getType() == SearchQuery {
return results
}
results = attrConfig.cleanResults(results)
return results
}
ret = []string{result}
} else {
for _, text := range found {
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
text = attrConfig.postProcess(ctx, text, q)
if attrConfig.hasSplit() {
return attrConfig.splitString(text)
}
ret = append(ret, text)
}
// skip cleaning when the query is used for searching
if q.getType() == SearchQuery {
return ret
}
ret = attrConfig.cleanResults(ret)
}
return ret
}
type mappedSceneScraperConfig struct {
mappedConfig
Tags mappedConfig `yaml:"Tags"`
Performers mappedPerformerScraperConfig `yaml:"Performers"`
Studio mappedConfig `yaml:"Studio"`
Movies mappedConfig `yaml:"Movies"`
}
type _mappedSceneScraperConfig mappedSceneScraperConfig
const (
mappedScraperConfigSceneTags = "Tags"
mappedScraperConfigScenePerformers = "Performers"
mappedScraperConfigSceneStudio = "Studio"
mappedScraperConfigSceneMovies = "Movies"
)
func (s *mappedSceneScraperConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
// HACK - unmarshal to map first, then remove known scene sub-fields, then
// remarshal to yaml and pass that down to the base map
parentMap := make(map[string]interface{})
if err := unmarshal(parentMap); err != nil {
return err
}
// move the known sub-fields to a separate map
thisMap := make(map[string]interface{})
thisMap[mappedScraperConfigSceneTags] = parentMap[mappedScraperConfigSceneTags]
thisMap[mappedScraperConfigScenePerformers] = parentMap[mappedScraperConfigScenePerformers]
thisMap[mappedScraperConfigSceneStudio] = parentMap[mappedScraperConfigSceneStudio]
thisMap[mappedScraperConfigSceneMovies] = parentMap[mappedScraperConfigSceneMovies]
delete(parentMap, mappedScraperConfigSceneTags)
delete(parentMap, mappedScraperConfigScenePerformers)
delete(parentMap, mappedScraperConfigSceneStudio)
delete(parentMap, mappedScraperConfigSceneMovies)
// re-unmarshal the sub-fields
yml, err := yaml.Marshal(thisMap)
if err != nil {
return err
}
// needs to be a different type to prevent infinite recursion
c := _mappedSceneScraperConfig{}
if err := yaml.Unmarshal(yml, &c); err != nil {
return err
}
*s = mappedSceneScraperConfig(c)
yml, err = yaml.Marshal(parentMap)
if err != nil {
return err
}
if err := yaml.Unmarshal(yml, &s.mappedConfig); err != nil {
return err
}
return nil
}
2020-10-20 22:24:32 +00:00
type mappedGalleryScraperConfig struct {
mappedConfig
Tags mappedConfig `yaml:"Tags"`
Performers mappedConfig `yaml:"Performers"`
Studio mappedConfig `yaml:"Studio"`
}
type _mappedGalleryScraperConfig mappedGalleryScraperConfig
func (s *mappedGalleryScraperConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
// HACK - unmarshal to map first, then remove known scene sub-fields, then
// remarshal to yaml and pass that down to the base map
parentMap := make(map[string]interface{})
if err := unmarshal(parentMap); err != nil {
return err
}
// move the known sub-fields to a separate map
thisMap := make(map[string]interface{})
thisMap[mappedScraperConfigSceneTags] = parentMap[mappedScraperConfigSceneTags]
thisMap[mappedScraperConfigScenePerformers] = parentMap[mappedScraperConfigScenePerformers]
thisMap[mappedScraperConfigSceneStudio] = parentMap[mappedScraperConfigSceneStudio]
delete(parentMap, mappedScraperConfigSceneTags)
delete(parentMap, mappedScraperConfigScenePerformers)
delete(parentMap, mappedScraperConfigSceneStudio)
// re-unmarshal the sub-fields
yml, err := yaml.Marshal(thisMap)
if err != nil {
return err
}
// needs to be a different type to prevent infinite recursion
c := _mappedGalleryScraperConfig{}
if err := yaml.Unmarshal(yml, &c); err != nil {
return err
}
*s = mappedGalleryScraperConfig(c)
yml, err = yaml.Marshal(parentMap)
if err != nil {
return err
}
if err := yaml.Unmarshal(yml, &s.mappedConfig); err != nil {
return err
}
return nil
}
type mappedPerformerScraperConfig struct {
mappedConfig
Tags mappedConfig `yaml:"Tags"`
}
type _mappedPerformerScraperConfig mappedPerformerScraperConfig
const (
mappedScraperConfigPerformerTags = "Tags"
)
func (s *mappedPerformerScraperConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
// HACK - unmarshal to map first, then remove known scene sub-fields, then
// remarshal to yaml and pass that down to the base map
parentMap := make(map[string]interface{})
if err := unmarshal(parentMap); err != nil {
return err
}
// move the known sub-fields to a separate map
thisMap := make(map[string]interface{})
thisMap[mappedScraperConfigPerformerTags] = parentMap[mappedScraperConfigPerformerTags]
delete(parentMap, mappedScraperConfigPerformerTags)
// re-unmarshal the sub-fields
yml, err := yaml.Marshal(thisMap)
if err != nil {
return err
}
// needs to be a different type to prevent infinite recursion
c := _mappedPerformerScraperConfig{}
if err := yaml.Unmarshal(yml, &c); err != nil {
return err
}
*s = mappedPerformerScraperConfig(c)
yml, err = yaml.Marshal(parentMap)
if err != nil {
return err
}
if err := yaml.Unmarshal(yml, &s.mappedConfig); err != nil {
return err
}
return nil
}
type mappedMovieScraperConfig struct {
mappedConfig
Studio mappedConfig `yaml:"Studio"`
}
type _mappedMovieScraperConfig mappedMovieScraperConfig
const (
mappedScraperConfigMovieStudio = "Studio"
)
func (s *mappedMovieScraperConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
// HACK - unmarshal to map first, then remove known movie sub-fields, then
// remarshal to yaml and pass that down to the base map
parentMap := make(map[string]interface{})
if err := unmarshal(parentMap); err != nil {
return err
}
// move the known sub-fields to a separate map
thisMap := make(map[string]interface{})
thisMap[mappedScraperConfigMovieStudio] = parentMap[mappedScraperConfigMovieStudio]
delete(parentMap, mappedScraperConfigMovieStudio)
// re-unmarshal the sub-fields
yml, err := yaml.Marshal(thisMap)
if err != nil {
return err
}
// needs to be a different type to prevent infinite recursion
c := _mappedMovieScraperConfig{}
if err := yaml.Unmarshal(yml, &c); err != nil {
return err
}
*s = mappedMovieScraperConfig(c)
yml, err = yaml.Marshal(parentMap)
if err != nil {
return err
}
if err := yaml.Unmarshal(yml, &s.mappedConfig); err != nil {
return err
}
return nil
}
type mappedRegexConfig struct {
Regex string `yaml:"regex"`
With string `yaml:"with"`
}
type mappedRegexConfigs []mappedRegexConfig
func (c mappedRegexConfig) apply(value string) string {
if c.Regex != "" {
re, err := regexp.Compile(c.Regex)
if err != nil {
logger.Warnf("Error compiling regex '%s': %s", c.Regex, err.Error())
return value
}
ret := re.ReplaceAllString(value, c.With)
// trim leading and trailing whitespace
// this is done to maintain backwards compatibility with existing
// scrapers
ret = strings.TrimSpace(ret)
logger.Debugf(`Replace: '%s' with '%s'`, c.Regex, c.With)
logger.Debugf("Before: %s", value)
logger.Debugf("After: %s", ret)
return ret
}
return value
}
func (c mappedRegexConfigs) apply(value string) string {
// apply regex in order
for _, config := range c {
value = config.apply(value)
}
return value
}
type postProcessAction interface {
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
Apply(ctx context.Context, value string, q mappedQuery) string
}
type postProcessParseDate string
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (p *postProcessParseDate) Apply(ctx context.Context, value string, q mappedQuery) string {
parseDate := string(*p)
const internalDateFormat = "2006-01-02"
valueLower := strings.ToLower(value)
if valueLower == "today" || valueLower == "yesterday" { // handle today, yesterday
dt := time.Now()
if valueLower == "yesterday" { // subtract 1 day from now
dt = dt.AddDate(0, 0, -1)
}
return dt.Format(internalDateFormat)
}
if parseDate == "" {
return value
}
if parseDate == "unix" {
// try to parse the date using unix timestamp format
// if it fails, then just fall back to the original value
timeAsInt, err := strconv.ParseInt(value, 10, 64)
if err != nil {
logger.Warnf("Error parsing date string '%s' using unix timestamp format : %s", value, err.Error())
return value
}
parsedValue := time.Unix(timeAsInt, 0)
return parsedValue.Format(internalDateFormat)
}
// try to parse the date using the pattern
// if it fails, then just fall back to the original value
parsedValue, err := time.Parse(parseDate, value)
if err != nil {
logger.Warnf("Error parsing date string '%s' using format '%s': %s", value, parseDate, err.Error())
return value
}
// convert it into our date format
return parsedValue.Format(internalDateFormat)
}
type postProcessSubtractDays bool
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (p *postProcessSubtractDays) Apply(ctx context.Context, value string, q mappedQuery) string {
const internalDateFormat = "2006-01-02"
i, err := strconv.Atoi(value)
if err != nil {
logger.Warnf("Error parsing day string %s: %s", value, err)
return value
}
dt := time.Now()
dt = dt.AddDate(0, 0, -i)
return dt.Format(internalDateFormat)
}
type postProcessReplace mappedRegexConfigs
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (c *postProcessReplace) Apply(ctx context.Context, value string, q mappedQuery) string {
replace := mappedRegexConfigs(*c)
return replace.apply(value)
}
type postProcessSubScraper mappedScraperAttrConfig
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (p *postProcessSubScraper) Apply(ctx context.Context, value string, q mappedQuery) string {
subScrapeConfig := mappedScraperAttrConfig(*p)
logger.Debugf("Sub-scraping for: %s", value)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
ss := q.subScrape(ctx, value)
if ss != nil {
Refactor scraper top half (#1893) * Simplify scraper listing Introduce an enum, scraper.Kind, which explains what we are looking for. Make it possible to match this from a scraper struct. Use the enum to rewrite all the listing code to use the same code path. * Use a map, nitpick ScrapePerformerList Let the cache store a map from ID of a scraper to the scraper. This improves lookups when there are many scrapers, making it practically O(1) rather than O(n). If many scrapers are stored, this is faster. Since range expressions work unchanged, we don't have to change much, and things will still work. make Kind a Stringer Rename ScraperPerformerList -> ScraperPerformerQuery since that name is used in the other scrapers, and we value consistency. Tune ScraperPerformerQuery: * Return static errors * Use the new functionality * When loading scrapers, do so directly Rather than first walking the directory structure to obtain file paths, fold the load directly in the the filepath walk. This makes the code for more direct. * Use static ErrNotFound If a scraper isn't found, return one static error. This paves the way for eventually doing our own error-presenter in gqlgen. * Store the cache in the Resolver state Putting the scraperCache directly in the resolver avoids the need to call manager.GetInstance() all over the place to get access to the scraper cache. The cache is stored by pointer, so it should be safe, since the cache will just update its internal state rather than being overwritten. We can now utilize the resolver state to grab the cache where needed. While here, pass context.Context from the resolver down into a function, which removes a context.TODO() * Introduce ScrapedContent Create a union in the GraphQL schema for all scraped content. This simplifies the internal implementation because we get variance on the output content type. Introduce a new type ScrapedContentType which signifies the scraped content you want as a caller. Use these to generalize the List interface and the URL scraping interface. * Simplify the scraper API Introduce a new interface for scraping. This interface is then used in the upper half of the scraper code, to make the code use one code flow rather than multiple code flows. Variance is currently at the old scraper structure. Add extending interfaces for the different ways of invoking scrapes. Use interface conversions to convert a scraper from the cache to a scraper supporting the extra methods. The return path returns models.ScrapedContent. Write a general postProcess function in the scraper, handling all ScrapedContent via type switching. This consolidates all postprocessing code flows. Introduce marhsallers in the resolver code for converting ScrapedContent into the underlying concrete types. Use this to plug the existing fields in the Query resolver, so everything still works. * ScrapedContent: add more marshalling functions Handle all marshalling of ScrapedContent through marhsalling functions. Removes some hand-rolled early variants of it, and replaces it with a canonical code flow. * Support loadByName via scraper_s In order to temporarily plug a hole in the current implementation, we use the older implementation as a hook to get the newer implementation to run. Later on, this can serve as a guide for how to implement the lower level bits inside the scrapers themselves. For now, it just enables support. * Plug the remaining scraper functions for now Since we would like to have a scraper which works in between refactors, plug the lower level parts of the scraper for now. It avoids us having to tackle this part just yet. * Move postprocessing to its own file There's enough postprocessing to clutter the main scrapers.go file. Move all of this into a new file, postprocessing to make the API simpler. It now lives in scrapers.go. * Scraper: Invoke API consistency scraper.Cache.ScrapeByName -> ScrapeName * Fix scraping scenes by URL Simple typo. While here, also make a single marshaller nil-aware. * Introduce scraper groups, consolidate loadByURL Rename `scraper_s` into `group`. A group is a group of scrapers with the same identity. This corresponds to a single YAML file for a scraper configuration. It defines a group which supports different types of scraping contexts. Move config into the group, and lift txnManager and globalConfig to the group. Because we now return models.ScrapedContent we can use interfaces to get variance from the different underlying scrapers. Use a type switch for the URL matcher candidates. And then again for the scrapers. This consolidates all URL scraping paths into one. While here, remove the urlMatcher interface which isn't needed. Also clean up the remaining interfaces for url scraping and delete code which has no purpose anymore. * Consolidate fragment scraping in one code path While here, abide the linters checks. * Refactor loadByFragment Give it the same treatment as loadByURL: Step 1: find a scraperActionImpl which works for the data. Step 2: use that to scrape Most of this is simple analysis on the data at hand. It can be pushed down further in a later commit, but for now we leave it here. * Remove configScraper, autotag is a scraper Remove the remains of the configScraper struct. It now lives on in the group struct. Kill the remaining interfaces from the old implementation while here. Remove group.specification since it can now be handled by a simple func call to spec(). Work through the autotag scraper. It now implements the scraper interface, so it can be used as a scraper. This also simplifies the autotag scraper quite a bit since it doens't have to implement a number of unsupported func calls. * Simplify the fragment scraper flow * Pass the context Eliminate a round of context.TODO() in the scraper code by passing the calling context down into the subsystem. This will gracefully allow for termination of remote calls if the client goes away for some reason in GraphQL requests. * Improve listScrapers in the schema Support lists of types we accept. * Be graceful on nil values in conversion Supporting nil-values make the API more robust in the case of partial results in a multi-scrape situation. * Improve listScrapers: output at-most-once Use the ID of a scraper to reduce the output set. If a scraper has been included, don't include it again. * Consolidate all API level errors into resolver.go * Reorder files and functions: scrapers.go -> cache.go: It almost contains nothing but the cache code. Move errors into scraper.go from here because It is a better place to have them living right now group.go: All of the group structure. This can now go from scraper.go, making it more lean. Move group create from config_scraper to here. config.go: Move the `(c config) spec()` call to here. config_scraper.go: Empty file by now * Name-update the scraper interfaces Use 'via' rather than 'loadBy'. The scrape happens via a given scrape method, so I think this is a nice name for it. * Rename scrapers for consistency. While here, improve the error formatting, so different errors come back differently. * Nuke the freeones field from the GraphQL schema * Fix autotag interfacing, refactor The autotag scraper uses a pointer receiver, but the rest of the code we use for scraping doesn't expect a pointer-receiver. Hence, to fix the autotag scraper, we change it to be a value receiver, like the rest of the code. Fix: viaScene, and viaGallery. While here, remove a couple of pointer-receiver methods which can be trivially rewritten into plain functions. * Protect against pointer interfaces The underlying code can be a bit inconsistent in what it returns. Introduce pointer-types in the postprocessing layer and handle them accordingly for now. Once a better understanding of the lower levels are understood, we can lift this. * Move ErrConversion into the models package. The conversion error pertains to the logic of converting models. Because of this, it should move there, so it is centralized. * Be consistent in scraper resolver error handling If we have a static error Err = errors.New(..) Then use it wrapped at the start: fmt.Errorf("%w: ...context...", Err) This reads better. While here, avoid using the underlying Atoi errors: they are verbose, and like 99% of the time, the user know what is wrong from the input string, so just give that back. Also, remove the scraper id from the error contexts: it is implicit, and the error wouldn't change if we used a different scraper, which the error message would imply. * Mark the list*Scrapers() API as deprecated The same functionality is now present in listScrapers. * Improve error formatting Think about how each error is going to be used and tweak them to be nicer. * Return a sorted list of scrapers This helps testing, it's closer to what we had, caches like stable data, and it is easier for humans. It also makes the output stable, because map iteration is randomized. * Fix listScrapers calls to return in ID-order Since we need the ordering to be by ID in all situations, it is easier to just generalize the cache listScrapers call to support multiple scraper types. This avoids a de-dupe map up the chain, since every scraper is only considered once. Sorting now happens in the cache listScrapers call. Use this generalized function in all resolvers, which are now simple passthroughs. * Remove UpdateConfig from the scraper cache. This isn't needed, so get rid of it. * Pull a context into identify Scraping scenes in the identify tasks now use a context from up the call chain. * Do not store the scraper cache in the resolver. Scraper caches are updated through manager.singleton•RefreshScraperCache, so we can't keep a pointer to it in the resolver. Instead, solve this by adding a fetcher method to the resolver type. This keeps it local to the resolver, while handling the problem of updating caches in the configuration.
2021-11-18 23:55:34 +00:00
found, err := ss.runQuery(subScrapeConfig.Selector)
if err != nil {
logger.Warnf("subscrape for '%v': %v", value, err)
}
if len(found) > 0 {
// check if we're concatenating the results into a single result
var result string
if subScrapeConfig.hasConcat() {
result = subScrapeConfig.concatenateResults(found)
} else {
result = found[0]
}
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
result = subScrapeConfig.postProcess(ctx, result, ss)
return result
}
}
return ""
}
type postProcessMap map[string]string
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (p *postProcessMap) Apply(ctx context.Context, value string, q mappedQuery) string {
// return the mapped value if present
m := *p
mapped, ok := m[value]
if ok {
return mapped
}
return value
}
type postProcessFeetToCm bool
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (p *postProcessFeetToCm) Apply(ctx context.Context, value string, q mappedQuery) string {
const foot_in_cm = 30.48
const inch_in_cm = 2.54
reg := regexp.MustCompile("[0-9]+")
filtered := reg.FindAllString(value, -1)
var feet float64
var inches float64
if len(filtered) > 0 {
feet, _ = strconv.ParseFloat(filtered[0], 64)
}
if len(filtered) > 1 {
inches, _ = strconv.ParseFloat(filtered[1], 64)
}
var centimeters = feet*foot_in_cm + inches*inch_in_cm
// Return rounded integer string
return strconv.Itoa(int(math.Round(centimeters)))
}
type postProcessLbToKg bool
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (p *postProcessLbToKg) Apply(ctx context.Context, value string, q mappedQuery) string {
const lb_in_kg = 0.45359237
w, err := strconv.ParseFloat(value, 64)
if err == nil {
Enable gocritic (#1848) * Don't capitalize local variables ValidCodecs -> validCodecs * Capitalize deprecation markers A deprecated marker should be capitalized. * Use re.MustCompile for static regexes If the regex fails to compile, it's a programmer error, and should be treated as such. The regex is entirely static. * Simplify else-if constructions Rewrite else { if cond {}} to else if cond {} * Use a switch statement to analyze formats Break an if-else chain. While here, simplify code flow. Also introduce a proper static error for unsupported image formats, paving the way for being able to check against the error. * Rewrite ifElse chains into switch statements The "Effective Go" https://golang.org/doc/effective_go#switch document mentions it is more idiomatic to write if-else chains as switches when it is possible. Find all the plain rewrite occurrences in the code base and rewrite. In some cases, the if-else chains are replaced by a switch scrutinizer. That is, the code sequence if x == 1 { .. } else if x == 2 { .. } else if x == 3 { ... } can be rewritten into switch x { case 1: .. case 2: .. case 3: .. } which is clearer for the compiler: it can decide if the switch is better served by a jump-table then a branch-chain. * Rewrite switches, introduce static errors Introduce two new static errors: * `ErrNotImplmented` * `ErrNotSupported` And use these rather than forming new generative errors whenever the code is called. Code can now test on the errors (since they are static and the pointers to them wont change). Also rewrite ifElse chains into switches in this part of the code base. * Introduce a StashBoxError in configuration Since all stashbox errors are the same, treat them as such in the code base. While here, rewrite an ifElse chain. In the future, it might be beneifical to refactor configuration errors into one error which can handle missing fields, which context the error occurs in and so on. But for now, try to get an overview of the error categories by hoisting them into static errors. * Get rid of an else-block in transaction handling If we succesfully `recover()`, we then always `panic()`. This means the rest of the code is not reachable, so we can avoid having an else-block here. It also solves an ifElse-chain style check in the code base. * Use strings.ReplaceAll Rewrite strings.Replace(s, o, n, -1) into strings.ReplaceAll(s, o, n) To make it consistent and clear that we are doing an all-replace in the string rather than replacing parts of it. It's more of a nitpick since there are no implementation differences: the stdlib implementation is just to supply -1. * Rewrite via gocritic's assignOp Statements of the form x = x + e is rewritten into x += e where applicable. * Formatting * Review comments handled Stash-box is a proper noun. Rewrite a switch into an if-chain which returns on the first error encountered. * Use context.TODO() over context.Background() Patch in the same vein as everything else: use the TODO() marker so we can search for it later and link it into the context tree/tentacle once it reaches down to this level in the code base. * Tell the linter to ignore a section in manager_tasks.go The section is less readable, so mark it with a nolint for now. Because the rewrite enables a ifElseChain, also mark that as nolint for now. * Use strings.ReplaceAll over strings.Replace * Apply an ifElse rewrite else { if .. { .. } } rewrite into else if { .. } * Use switch-statements over ifElseChains Rewrite chains of if-else into switch statements. Where applicable, add an early nil-guard to simplify case analysis. Also, in ScanTask's Start(..), invert the logic to outdent the whole block, and help the reader: if it's not a scene, the function flow is now far more local to the top of the function, and it's clear that the rest of the function has to do with scene management. * Enable gocritic on the code base. Disable appendAssign for now since we aren't passing that check yet. * Document the nolint additions * Document StashBoxBatchPerformerTagInput
2021-10-18 03:12:40 +00:00
w *= lb_in_kg
value = strconv.Itoa(int(math.Round(w)))
}
return value
}
type mappedPostProcessAction struct {
ParseDate string `yaml:"parseDate"`
SubtractDays bool `yaml:"subtractDays"`
Replace mappedRegexConfigs `yaml:"replace"`
SubScraper *mappedScraperAttrConfig `yaml:"subScraper"`
Map map[string]string `yaml:"map"`
FeetToCm bool `yaml:"feetToCm"`
LbToKg bool `yaml:"lbToKg"`
}
func (a mappedPostProcessAction) ToPostProcessAction() (postProcessAction, error) {
var found string
var ret postProcessAction
if a.ParseDate != "" {
found = "parseDate"
action := postProcessParseDate(a.ParseDate)
ret = &action
}
if len(a.Replace) > 0 {
if found != "" {
return nil, fmt.Errorf("post-process actions must have a single field, found %s and %s", found, "replace")
}
found = "replace"
action := postProcessReplace(a.Replace)
ret = &action
}
if a.SubScraper != nil {
if found != "" {
return nil, fmt.Errorf("post-process actions must have a single field, found %s and %s", found, "subScraper")
}
found = "subScraper"
action := postProcessSubScraper(*a.SubScraper)
ret = &action
}
if a.Map != nil {
if found != "" {
return nil, fmt.Errorf("post-process actions must have a single field, found %s and %s", found, "map")
}
found = "map"
action := postProcessMap(a.Map)
ret = &action
}
if a.FeetToCm {
if found != "" {
return nil, fmt.Errorf("post-process actions must have a single field, found %s and %s", found, "feetToCm")
}
found = "feetToCm"
action := postProcessFeetToCm(a.FeetToCm)
ret = &action
}
if a.LbToKg {
if found != "" {
return nil, fmt.Errorf("post-process actions must have a single field, found %s and %s", found, "lbToKg")
}
found = "lbToKg"
action := postProcessLbToKg(a.LbToKg)
ret = &action
}
if a.SubtractDays {
if found != "" {
return nil, fmt.Errorf("post-process actions must have a single field, found %s and %s", found, "subtractDays")
}
// found = "subtractDays"
action := postProcessSubtractDays(a.SubtractDays)
ret = &action
}
if ret == nil {
return nil, errors.New("invalid post-process action")
}
return ret, nil
}
type mappedScraperAttrConfig struct {
Selector string `yaml:"selector"`
Fixed string `yaml:"fixed"`
PostProcess []mappedPostProcessAction `yaml:"postProcess"`
Concat string `yaml:"concat"`
Split string `yaml:"split"`
postProcessActions []postProcessAction
Enable gocritic (#1848) * Don't capitalize local variables ValidCodecs -> validCodecs * Capitalize deprecation markers A deprecated marker should be capitalized. * Use re.MustCompile for static regexes If the regex fails to compile, it's a programmer error, and should be treated as such. The regex is entirely static. * Simplify else-if constructions Rewrite else { if cond {}} to else if cond {} * Use a switch statement to analyze formats Break an if-else chain. While here, simplify code flow. Also introduce a proper static error for unsupported image formats, paving the way for being able to check against the error. * Rewrite ifElse chains into switch statements The "Effective Go" https://golang.org/doc/effective_go#switch document mentions it is more idiomatic to write if-else chains as switches when it is possible. Find all the plain rewrite occurrences in the code base and rewrite. In some cases, the if-else chains are replaced by a switch scrutinizer. That is, the code sequence if x == 1 { .. } else if x == 2 { .. } else if x == 3 { ... } can be rewritten into switch x { case 1: .. case 2: .. case 3: .. } which is clearer for the compiler: it can decide if the switch is better served by a jump-table then a branch-chain. * Rewrite switches, introduce static errors Introduce two new static errors: * `ErrNotImplmented` * `ErrNotSupported` And use these rather than forming new generative errors whenever the code is called. Code can now test on the errors (since they are static and the pointers to them wont change). Also rewrite ifElse chains into switches in this part of the code base. * Introduce a StashBoxError in configuration Since all stashbox errors are the same, treat them as such in the code base. While here, rewrite an ifElse chain. In the future, it might be beneifical to refactor configuration errors into one error which can handle missing fields, which context the error occurs in and so on. But for now, try to get an overview of the error categories by hoisting them into static errors. * Get rid of an else-block in transaction handling If we succesfully `recover()`, we then always `panic()`. This means the rest of the code is not reachable, so we can avoid having an else-block here. It also solves an ifElse-chain style check in the code base. * Use strings.ReplaceAll Rewrite strings.Replace(s, o, n, -1) into strings.ReplaceAll(s, o, n) To make it consistent and clear that we are doing an all-replace in the string rather than replacing parts of it. It's more of a nitpick since there are no implementation differences: the stdlib implementation is just to supply -1. * Rewrite via gocritic's assignOp Statements of the form x = x + e is rewritten into x += e where applicable. * Formatting * Review comments handled Stash-box is a proper noun. Rewrite a switch into an if-chain which returns on the first error encountered. * Use context.TODO() over context.Background() Patch in the same vein as everything else: use the TODO() marker so we can search for it later and link it into the context tree/tentacle once it reaches down to this level in the code base. * Tell the linter to ignore a section in manager_tasks.go The section is less readable, so mark it with a nolint for now. Because the rewrite enables a ifElseChain, also mark that as nolint for now. * Use strings.ReplaceAll over strings.Replace * Apply an ifElse rewrite else { if .. { .. } } rewrite into else if { .. } * Use switch-statements over ifElseChains Rewrite chains of if-else into switch statements. Where applicable, add an early nil-guard to simplify case analysis. Also, in ScanTask's Start(..), invert the logic to outdent the whole block, and help the reader: if it's not a scene, the function flow is now far more local to the top of the function, and it's clear that the rest of the function has to do with scene management. * Enable gocritic on the code base. Disable appendAssign for now since we aren't passing that check yet. * Document the nolint additions * Document StashBoxBatchPerformerTagInput
2021-10-18 03:12:40 +00:00
// Deprecated: use PostProcess instead
ParseDate string `yaml:"parseDate"`
Replace mappedRegexConfigs `yaml:"replace"`
SubScraper *mappedScraperAttrConfig `yaml:"subScraper"`
}
type _mappedScraperAttrConfig mappedScraperAttrConfig
func (c *mappedScraperAttrConfig) UnmarshalYAML(unmarshal func(interface{}) error) error {
// try unmarshalling into a string first
if err := unmarshal(&c.Selector); err != nil {
// if it's a type error then we try to unmarshall to the full object
Errorlint sweep + minor linter tweaks (#1796) * Replace error assertions with Go 1.13 style Use `errors.As(..)` over type assertions. This enables better use of wrapped errors in the future, and lets us pass some errorlint checks in the process. The rewrite is entirely mechanical, and uses a standard idiom for doing so. * Use Go 1.13's errors.Is(..) Rather than directly checking for error equality, use errors.Is(..). This protects against error wrapping issues in the future. Even though something like sql.ErrNoRows doesn't need the wrapping, do so anyway, for the sake of consistency throughout the code base. The change almost lets us pass the `errorlint` Go checker except for a missing case in `js.go` which is to be handled separately; it isn't mechanical, like these changes are. * Remove goconst goconst isn't a useful linter in many cases, because it's false positive rate is high. It's 100% for the current code base. * Avoid direct comparison of errors in recover() Assert that we are catching an error from recover(). If we are, check that the error caught matches errStop. * Enable the "errorlint" checker Configure the checker to avoid checking for errorf wraps. These are often false positives since the suggestion is to blanket wrap errors with %w, and that exposes the underlying API which you might not want to do. The other warnings are good however, and with the current patch stack, the code base passes all these checks as well. * Configure rowserrcheck The project uses sqlx. Configure rowserrcheck to include said package. * Mechanically rewrite a large set of errors Mechanically search for errors that look like fmt.Errorf("...%s", err.Error()) and rewrite those into fmt.Errorf("...%v", err) The `fmt` package is error-aware and knows how to call err.Error() itself. The rationale is that this is more idiomatic Go; it paves the way for using error wrapping later with %w in some sites. This patch only addresses the entirely mechanical rewriting caught by a project-side search/replace. There are more individual sites not addressed by this patch.
2021-10-12 03:03:08 +00:00
var typeErr *yaml.TypeError
if !errors.As(err, &typeErr) {
return err
}
// unmarshall to full object
// need it as a separate object
t := _mappedScraperAttrConfig{}
if err = unmarshal(&t); err != nil {
return err
}
*c = mappedScraperAttrConfig(t)
}
return c.convertPostProcessActions()
}
func (c *mappedScraperAttrConfig) convertPostProcessActions() error {
// ensure we don't have the old deprecated fields and the new post process field
if len(c.PostProcess) > 0 {
if c.ParseDate != "" || len(c.Replace) > 0 || c.SubScraper != nil {
return errors.New("cannot include postProcess and (parseDate, replace, subScraper) deprecated fields")
}
// convert xpathPostProcessAction actions to postProcessActions
for _, a := range c.PostProcess {
action, err := a.ToPostProcessAction()
if err != nil {
return err
}
c.postProcessActions = append(c.postProcessActions, action)
}
c.PostProcess = nil
} else {
// convert old deprecated fields if present
// in same order as they used to be executed
if len(c.Replace) > 0 {
action := postProcessReplace(c.Replace)
c.postProcessActions = append(c.postProcessActions, &action)
c.Replace = nil
}
if c.SubScraper != nil {
action := postProcessSubScraper(*c.SubScraper)
c.postProcessActions = append(c.postProcessActions, &action)
c.SubScraper = nil
}
if c.ParseDate != "" {
action := postProcessParseDate(c.ParseDate)
c.postProcessActions = append(c.postProcessActions, &action)
c.ParseDate = ""
}
}
return nil
}
func (c mappedScraperAttrConfig) hasConcat() bool {
return c.Concat != ""
}
func (c mappedScraperAttrConfig) hasSplit() bool {
return c.Split != ""
}
func (c mappedScraperAttrConfig) concatenateResults(nodes []string) string {
separator := c.Concat
return strings.Join(nodes, separator)
}
func (c mappedScraperAttrConfig) cleanResults(nodes []string) []string {
cleaned := stringslice.StrUnique(nodes) // remove duplicate values
cleaned = stringslice.StrDelete(cleaned, "") // remove empty values
return cleaned
}
func (c mappedScraperAttrConfig) splitString(value string) []string {
separator := c.Split
var res []string
if separator == "" {
return []string{value}
}
for _, str := range strings.Split(value, separator) {
if str != "" {
res = append(res, str)
}
}
return res
}
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (c mappedScraperAttrConfig) postProcess(ctx context.Context, value string, q mappedQuery) string {
for _, action := range c.postProcessActions {
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
value = action.Apply(ctx, value, q)
}
return value
}
type mappedScrapers map[string]*mappedScraper
type mappedScraper struct {
Common commonMappedConfig `yaml:"common"`
Scene *mappedSceneScraperConfig `yaml:"scene"`
2020-10-20 22:24:32 +00:00
Gallery *mappedGalleryScraperConfig `yaml:"gallery"`
Performer *mappedPerformerScraperConfig `yaml:"performer"`
Movie *mappedMovieScraperConfig `yaml:"movie"`
}
type mappedResult map[string]string
type mappedResults []mappedResult
func (r mappedResult) apply(dest interface{}) {
destVal := reflect.ValueOf(dest)
// dest should be a pointer
destVal = destVal.Elem()
for key, value := range r {
field := destVal.FieldByName(key)
if field.IsValid() {
var reflectValue reflect.Value
if field.Kind() == reflect.Ptr {
// need to copy the value, otherwise everything is set to the
// same pointer
localValue := value
reflectValue = reflect.ValueOf(&localValue)
} else {
reflectValue = reflect.ValueOf(value)
}
field.Set(reflectValue)
} else {
logger.Errorf("Field %s does not exist in %T", key, dest)
}
}
}
func (r mappedResults) setKey(index int, key string, value string) mappedResults {
if index >= len(r) {
r = append(r, make(mappedResult))
}
logger.Debugf(`[%d][%s] = %s`, index, key, value)
r[index][key] = value
return r
}
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (s mappedScraper) scrapePerformer(ctx context.Context, q mappedQuery) (*models.ScrapedPerformer, error) {
var ret *models.ScrapedPerformer
performerMap := s.Performer
if performerMap == nil {
return nil, nil
}
performerTagsMap := performerMap.Tags
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
results := performerMap.process(ctx, q, s.Common)
if len(results) > 0 {
ret = &models.ScrapedPerformer{}
results[0].apply(ret)
// now apply the tags
if performerTagsMap != nil {
logger.Debug(`Processing performer tags:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
tagResults := performerTagsMap.process(ctx, q, s.Common)
for _, p := range tagResults {
tag := &models.ScrapedTag{}
p.apply(tag)
ret.Tags = append(ret.Tags, tag)
}
}
}
return ret, nil
}
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (s mappedScraper) scrapePerformers(ctx context.Context, q mappedQuery) ([]*models.ScrapedPerformer, error) {
var ret []*models.ScrapedPerformer
performerMap := s.Performer
if performerMap == nil {
return nil, nil
}
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
results := performerMap.process(ctx, q, s.Common)
for _, r := range results {
var p models.ScrapedPerformer
r.apply(&p)
ret = append(ret, &p)
}
return ret, nil
}
func (s mappedScraper) processScene(ctx context.Context, q mappedQuery, r mappedResult) *ScrapedScene {
var ret ScrapedScene
sceneScraperConfig := s.Scene
scenePerformersMap := sceneScraperConfig.Performers
sceneTagsMap := sceneScraperConfig.Tags
sceneStudioMap := sceneScraperConfig.Studio
sceneMoviesMap := sceneScraperConfig.Movies
scenePerformerTagsMap := scenePerformersMap.Tags
r.apply(&ret)
// process performer tags once
var performerTagResults mappedResults
if scenePerformerTagsMap != nil {
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
performerTagResults = scenePerformerTagsMap.process(ctx, q, s.Common)
}
// now apply the performers and tags
if scenePerformersMap.mappedConfig != nil {
logger.Debug(`Processing scene performers:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
performerResults := scenePerformersMap.process(ctx, q, s.Common)
for _, p := range performerResults {
performer := &models.ScrapedPerformer{}
p.apply(performer)
for _, p := range performerTagResults {
tag := &models.ScrapedTag{}
p.apply(tag)
performer.Tags = append(performer.Tags, tag)
}
ret.Performers = append(ret.Performers, performer)
}
}
if sceneTagsMap != nil {
logger.Debug(`Processing scene tags:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
tagResults := sceneTagsMap.process(ctx, q, s.Common)
for _, p := range tagResults {
tag := &models.ScrapedTag{}
p.apply(tag)
ret.Tags = append(ret.Tags, tag)
}
}
if sceneStudioMap != nil {
logger.Debug(`Processing scene studio:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
studioResults := sceneStudioMap.process(ctx, q, s.Common)
if len(studioResults) > 0 {
studio := &models.ScrapedStudio{}
studioResults[0].apply(studio)
ret.Studio = studio
}
}
if sceneMoviesMap != nil {
logger.Debug(`Processing scene movies:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
movieResults := sceneMoviesMap.process(ctx, q, s.Common)
for _, p := range movieResults {
movie := &models.ScrapedMovie{}
p.apply(movie)
ret.Movies = append(ret.Movies, movie)
}
}
return &ret
}
func (s mappedScraper) scrapeScenes(ctx context.Context, q mappedQuery) ([]*ScrapedScene, error) {
var ret []*ScrapedScene
sceneScraperConfig := s.Scene
sceneMap := sceneScraperConfig.mappedConfig
if sceneMap == nil {
return nil, nil
}
logger.Debug(`Processing scenes:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
results := sceneMap.process(ctx, q, s.Common)
for _, r := range results {
logger.Debug(`Processing scene:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
ret = append(ret, s.processScene(ctx, q, r))
}
return ret, nil
}
func (s mappedScraper) scrapeScene(ctx context.Context, q mappedQuery) (*ScrapedScene, error) {
var ret *ScrapedScene
sceneScraperConfig := s.Scene
sceneMap := sceneScraperConfig.mappedConfig
if sceneMap == nil {
return nil, nil
}
logger.Debug(`Processing scene:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
results := sceneMap.process(ctx, q, s.Common)
if len(results) > 0 {
ret = s.processScene(ctx, q, results[0])
}
return ret, nil
}
func (s mappedScraper) scrapeGallery(ctx context.Context, q mappedQuery) (*ScrapedGallery, error) {
var ret *ScrapedGallery
2020-10-20 22:24:32 +00:00
galleryScraperConfig := s.Gallery
galleryMap := galleryScraperConfig.mappedConfig
if galleryMap == nil {
return nil, nil
}
galleryPerformersMap := galleryScraperConfig.Performers
galleryTagsMap := galleryScraperConfig.Tags
galleryStudioMap := galleryScraperConfig.Studio
logger.Debug(`Processing gallery:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
results := galleryMap.process(ctx, q, s.Common)
2020-10-20 22:24:32 +00:00
if len(results) > 0 {
ret = &ScrapedGallery{}
results[0].apply(ret)
2020-10-20 22:24:32 +00:00
// now apply the performers and tags
if galleryPerformersMap != nil {
logger.Debug(`Processing gallery performers:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
performerResults := galleryPerformersMap.process(ctx, q, s.Common)
2020-10-20 22:24:32 +00:00
for _, p := range performerResults {
performer := &models.ScrapedPerformer{}
2020-10-20 22:24:32 +00:00
p.apply(performer)
ret.Performers = append(ret.Performers, performer)
}
}
if galleryTagsMap != nil {
logger.Debug(`Processing gallery tags:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
tagResults := galleryTagsMap.process(ctx, q, s.Common)
2020-10-20 22:24:32 +00:00
for _, p := range tagResults {
tag := &models.ScrapedTag{}
2020-10-20 22:24:32 +00:00
p.apply(tag)
ret.Tags = append(ret.Tags, tag)
}
}
if galleryStudioMap != nil {
logger.Debug(`Processing gallery studio:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
studioResults := galleryStudioMap.process(ctx, q, s.Common)
2020-10-20 22:24:32 +00:00
if len(studioResults) > 0 {
studio := &models.ScrapedStudio{}
2020-10-20 22:24:32 +00:00
studioResults[0].apply(studio)
ret.Studio = studio
}
}
}
return ret, nil
2020-10-20 22:24:32 +00:00
}
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
func (s mappedScraper) scrapeMovie(ctx context.Context, q mappedQuery) (*models.ScrapedMovie, error) {
var ret *models.ScrapedMovie
movieScraperConfig := s.Movie
movieMap := movieScraperConfig.mappedConfig
if movieMap == nil {
return nil, nil
}
movieStudioMap := movieScraperConfig.Studio
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
results := movieMap.process(ctx, q, s.Common)
if len(results) > 0 {
ret = &models.ScrapedMovie{}
results[0].apply(ret)
if movieStudioMap != nil {
logger.Debug(`Processing movie studio:`)
Scraper refactor middle (#2043) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.
2021-11-26 00:20:06 +00:00
studioResults := movieStudioMap.process(ctx, q, s.Common)
if len(studioResults) > 0 {
studio := &models.ScrapedStudio{}
studioResults[0].apply(studio)
ret.Studio = studio
}
}
}
return ret, nil
}