Rooba/stash - stash - Hypermine gitea

Author	SHA1	Message	Date
JackDawson94	65d1353f2c	Allow use of Proxy (#3284 ) * Proxy config * Disable proxy for localhost & local LAN * No_proxy is now configurable	2023-02-07 09:46:18 +11:00
bnkai	be5dc7e545	Resolve hostname for chromium RDP requests (#2174 )	2022-01-04 15:47:39 +11:00
SmallCoccinelle	4089fcf1e2	Scraper refactor middle (#2043 ) * Push scrapeByURL into scrapers Replace ScrapePerfomerByURL, ScrapeMovie..., ... with ScrapeByURL in the scraperActionImpl interface. This allows us to delete a lot of repeated code in the scrapers and replace the central part with a switch on the scraper type. * Fold name scraping into one call Follow up on scraper refactoring. Name scrapers use the same code path. This allows us to restructure some code and kill some functions, adding variance to the name scraping code. It allows us to remove some code repetition as well. * Do not export loop refs. * Simplify fragment scraping Generalize fragment scrapers into ScrapeByFragment. This simplifies fragment code flows into a simpler pathing which should be easier to handle in the future. * Eliminate more context.TODO() In a number of cases, we have a context now. Use the context rather than TODO() for those cases in order to make those operations cancellable. * Pass the context for the stashbox scraper This removes all context.TODO() in the path of the stashbox scraper, and replaces it with the context that's present on each of the paths. * Pass the context into subscrapers Mostly a mechanical update, where we pass in the context for subscraping. This removes the final context.TODO() in the scraper code. * Warn on unknown fields from scripts A common mistake for new script writers are that they return fields not known to stash. For instance the name "description" is used rather than "details". Decode disallowing unknown fields. If this fails, use a tee-reader to fall back to the old behavior, but print a warning for the user in this case. Thus, we retain the old behavior, but print warnings for scripts which fails the more strict unknown-fields detection. * Nil-check before running the postprocessing chain Fixes panics when scraping returns nil values. * Lift nil-ness in post-postprocessing If the struct we are trying to post-process is nil, we shouldn't enter the postprocessing flow at all. Pass the struct as a value rather than a pointer, eliminating nil-checks as we go. Use the top-level postProcess call to make the nil-check and then abort there if the object we are looking at is nil. * Allow conversion routines to handle values If we have a non-pointer type in the interface, we should also convert those into ScrapedContent. Otherwise we get errors on deprecated functions.	2021-11-26 11:20:06 +11:00
SmallCoccinelle	e513b6ffa5	Cache and reuse the scraper HTTP client (#1855 ) * Add Cookies directly to the request Rather than maintaining a cookie jar on a one-shot HTTP client, maintain the jar ourselves: make a new jar, then use it to select the right cookies. The cookies are set on the request rather than on the client. This will retain the current behavior as we are always throwing the client away after each use. This patch enables the lifting of the http client as well over time. * Introduce a cached scraper HTTP client The scraper cache is augmented with an http.Client. These are safe for concurrent use, so the pointer can safely be passed around. Push this into scraper configurations where applicable, next to the txnManagers. When we issue a loadUrl request, do so on the cached http.Client, which will reuse existing idle connections in the client if any are present. * Set MaxIdleConnsPerHost. Closes #1850 We allow for up to 8 idle connections to a single host. This should make concurrent operation toward the same host reuse connections, even for sizeable concurrency. The number isn't bumped excessively high. We should probably limit concurrency toward a single site anyway, since we'll be able to overrun a site with queries quite easily if we have many concurrent goroutines issuing requests at the same time. * Reinstate driverOptions / useCDP check Use DeMorgan's laws to invert the logic and exit early. Fixes tests breaking. * Documentation fixup. * Use the scraper http.Client when fetching images Fold image fetchers onto the cached scraper http.Client as well. This makes the scraper have a single http.Client cache for all its operations. Thread the client upwards to the relevant attachment points: either the cache, or a stash_box instance, which is extended to include a pointer to the client. Style roughly follows that of txnManagers. * Use the same http Client as the GraphQL client use Rather than using http.DefaultClient, use the same client as the GraphQL client use in the stash_box subsystem. This localizes the client used in the subsystem into the constructing New.. call. * Hoist HTTP client construction Create a function for initializaing the HTTP Client we use. While here hoist magic numbers into constants. Introduce a proper static redirect error and use it in the client code as well. * Reinstate printCookies This is a debugging function, and it might still come in handy in the future at some point. * Nitpick comment. * Minor tidy Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>	2021-10-20 16:12:24 +11:00
SmallCoccinelle	655d3ae969	Toward better context handling (#1835 ) * Use the request context The code uses context.Background() in a flow where there is a http.Request. Use the requests context instead. * Use a true context in the plugin example Let AddTag/RemoveTag take a context and use that context throughout the example. * Avoid the use of context.Background Prefer context.TODO over context.Background deep in the call chain. This marks the site as something which we need to context-handle later, and also makes it clear to the reader that the context is sort-of temporary in the code base. While here, be consistent in handling the `act` variable in each branch of the if .. { .. } .. check. * Prefer context.TODO over context.Background For the different scraping operations here, there is a context higher up the call chain, which we ought to use. Mark the call-sites as TODO for now, so we can come back later on a sweep of which parts can be context-lifted. * Thread context upwards Initialization requires context for transactions. Thread the context upward the call chain. At the intialization call, add a context.TODO since we can't break this yet. The singleton assumption prevents us from pulling it up into main for now. * make tasks context-aware Change the task interface to understand contexts. Pass the context down in some of the branches where it is needed. * Make QueryStashBoxScene context-aware This call naturally sits inside the request-context. Use it. * Introduce a context in the JS plugin code This allows us to use a context for HTTP calls inside the system. Mark the context with a TODO at top level for now. * Nitpick error formatting Use %v rather than %s for error interfaces. Do not begin an error strong with a capital letter. * Avoid the use of http.Get in FFMPEG download chain Since http.Get has no context, it isn't possible to break out or have policy induced. The call will block until the GET completes. Rewrite to use a http Request and provide a context. Thread the context through the call chain for now. provide context.TODO() at the top level of the initialization chain. * Make getRemoteCDPWSAddress aware of contexts Eliminate a call to http.Get and replace it with a context-aware variant. Push the context upwards in the call chain, but plug it before the scraper interface so we don't have to rewrite said interface yet. Plugged with context.TODO() * Scraper: make the getImage function context-aware Use a context, and pass it upwards. Plug it with context.TODO() up the chain before the rewrite gets too much out of hand for now. Minor tweaks along the way, remove a call to context.Background() deep in the call chain. * Make NOTIFY request context-aware The call sits inside a Request-handler. So it's natural to use the requests context as the context for the outgoing HTTP request. * Use a context in the url scraper code We are sitting in code which has a context, so utilize it for the request as well. * Use a context when checking versions When we check the version of stash on Github, use a context. Thread the context up to the initialization routine of the HTTP/GraphQL server and plug it with a context.TODO() for now. This paves the way for providing a context to the HTTP server code in a future patch. * Make utils func ReadImage context-aware In almost all of the cases, there is a context in the call chain which is a natural use. This is true for all the GraphQL mutations. The exception is in task_stash_box_tag, so plug that task with context.TODO() for now. * Make stash-box get context-aware Thread a context through the call chain until we hit the Client API. Plug it with context.TODO() there for now. * Enable the noctx linter The code is now free of any uncontexted HTTP request. This means we pass the noctx linter, and we can enable it in the code base.	2021-10-14 15:32:41 +11:00
SmallCoccinelle	a5ca8fc678	Enable safe linters (#1786 ) * Enable safe linters Enable the linters dogsled, rowserrcheck, and sqlclosecheck. These report no errors currently in the code base. Enable misspell. Misspell finds two spelling mistakes in comments, which are fixed by the patch as well. Add and sort linters which are relatively safe to add over time. Comment them out for now. * Close the response body If we can get a HTTP response, it has a body which ought to be closed. By doing so, we avoid potentially leaking connections. * Enable the exportloopref linter There are two places in the code with these warnings. Fix them while enabling the linter. * Remove redundant types in tests If a slice already determines the type, the inner type declaration is redundant. Remove the inner declarations. * Mark autotag test cases as parallel Autotag test cases is by far the outlier when it comes to test time. While go test runs test cases in parallel, it doesn't do so inside a given package, unless one marks the test cases as parallel. This change provides a significant speedup on a 8-core machine for test runs.	2021-10-03 11:48:03 +11:00
Eng Zer Jun	62af723017	refactor: move from io/ioutil to io and os package (#1772 ) The io/ioutil package has been deprecated as of Go 1.16, see https://golang.org/doc/go1.16#ioutil. This commit replaces the existing io/ioutil functions with their new definitions in io and os packages. Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2021-09-27 10:55:23 +10:00
WithoutPants	1a3a2f1f83	Scrape scene by name (#1712 ) * Support scrape scene by name in configs * Initial scene querying * Add to manual	2021-09-14 14:54:53 +10:00
SmallCoccinelle	4b00d24248	Remove unused (#1709 ) * Remove stuff which isn't being used Some fields, functions and structs aren't in use by the project. Remove them for janitorial reasons. * Remove more unused code All of these functions are currently not in use. Clean up the code by removal, since the version control has the code if need be. * Remove unused functions There's a large set of unused functions and variables in the code base. Remove these, so it clearer what code to support going forward. Dead code has been eliminated. Where applicable, comment const-sections in tests, so reserved identifiers are still known. * Fix use-def of tsURL The first def of tsURL doesn't matter because there's no use before we hit the 2nd def. * Remove dead code assignment Setting logFile = "" is effectively dead code, because there's no use of it later. * Comment out found The variable 'found' is dead in the function (because no post-process action is following it). Comment it for now. * Comment dead code in tests These might provide hints as to what isn't covered at the moment. * Dead code removal In the case of constants where iota is involved, move the iota so it matches the current key values. This avoids problems with persistently stored key IDs.	2021-09-09 14:10:08 +10:00
SmallCoccinelle	e7f6cb22b7	Error strings noncapitalized (#1704 ) * Fix error string capitalization Error strings often follow another string. Hence, they should not be capitalized, unless referencing a name. * Uncapitalize more error strings While here, use %v on the error directly, which makes it easier to wrap the error later with %w if need be. * Uncapitalize more error strings While here, rename Url to URL as a nitpick.	2021-09-08 11:23:10 +10:00
bnkai	4c05535a13	Fix potential race condintion in CDP (#1536 )	2021-06-28 10:36:51 +10:00
bnkai	cd6b6b74eb	Add http headers support to scraper (#1273 )	2021-04-16 15:42:56 +10:00
WithoutPants	f6ffda7504	Setup and migration UI refactor (#1190 ) * Make config instance-based * Remove config dependency in paths * Refactor config init * Allow startup without database * Get system status at UI initialise * Add setup wizard * Cache and Metadata optional. Database mandatory * Handle metadata not set during full import/export * Add links * Remove config check middleware * Stash not mandatory * Panic on missing mandatory config fields * Redirect setup to main page if setup not required * Add migration UI * Remove unused stuff * Move UI initialisation into App * Don't create metadata paths on RefreshConfig * Add folder selector for generated in setup * Env variable to set and create config file. Make docker images use a fixed config file. * Set config file during setup	2021-04-12 09:31:33 +10:00
bnkai	68d4a4fe42	Add User Agent to image download reqs (#1222 )	2021-03-24 08:12:11 +11:00
bnkai	144cd6e4f2	Skip insecure certificates check when scraping (#1120 ) * Ignore insecure certificates when scraping * add ScraperCertCheck to scraper config options	2021-03-01 11:47:39 +11:00
bnkai	e883e5fe27	Add Mouse Click support for the CDP scraper (#827 )	2020-12-22 09:42:31 +11:00
bnkai	a96ab9ce6f	Add support for setting cookies in the scraper (#934 )	2020-12-01 16:34:09 +11:00
WithoutPants	109e55a25a	Query url parameters (#878 )	2020-10-22 11:56:04 +11:00
SpedNSFW	147d0067f5	Add gallery scraping (#862 )	2020-10-21 09:24:32 +11:00
WithoutPants	7158e83b75	Add JSON scrape support (#717 ) * Add support for scene fragment scrape in xpath	2020-08-10 14:21:50 +10:00
WithoutPants	b166abfa7b	Fix scraping error (#704 )	2020-08-04 20:43:56 +10:00
bnkai	4373f9bf01	Add cdp support for xpath scrapers (#625 ) Co-authored-by: WithoutPants <53250216+WithoutPants@users.noreply.github.com>	2020-08-04 10:42:40 +10:00