changelog
-
+
version 400
+ - subscription data overhaul: +
- the formerly monolithic subscription object is finally broken up into smaller pieces, reducing work and load lag and total db read/write for all actions +
- subscriptions work the same as before, no user input is required. they just work better now™ +
- depending on the size and number of your subscriptions, the db update may take a minute or two this week. a backup of your old subscription objects will be created in your db directory, under a new 'legacy_subscriptions_backup' subdirectory +
- the manage subscriptions dialog should now open within a second (assuming subs are not currently running). it should save just as fast, only with a little lag if you decide to make significant changes or go into many queries' logs, which are now fetched on demand inside the dialog +
- when subscriptions run, they similarly only have to load the query they are currently working on. boot lag is now almost nothing, and total drive read/write data for a typical sub run is massively reduced +
- the 'total files in a sub' limits no longer apply. you can have a sub with a thousand queries and half a million urls if you like +
- basic subscription data is now held in memory at all times, opening up future fast access such as client api and general UI editing of subs. more work will happen here in coming weeks +
- if due to hard drive fault or other unusual situations some subscription file/gallery log data is missing from the db, a running sub will note this, pause the sub, and provide a popup error for the user. the manage subscription dialog will correct it on launch by resetting the affected queries with new empty data +
- similarly, if you launch the manage subs dialog and there is orphaned file/gallery log data in the db, this will be noticed, with the surplus data then backed up to the database directory and deleted from the database proper +
- subscription queries can now handle domain and bandwidth tests for downloaders that host files/posts on a different domain to the gallery search step +
- if subs are running when manage subs is booted, long delays while waiting for them to pause are less likely +
- some subscription 'should run?' tests are improved for odd situations such as subs that have no queries or all DEAD queries +
- improved some error handling in merge/separate code +
- the 'show/copy quality info' buttons now work off the main thread, disabling the sub edit dialog while they work +
- updated a little of the subs help +
- . +
- boring actual code changes for subs: +
- wrote a query log container object to store bulky file and gallery log info +
- wrote a query header object to store options and cache log summary info +
- wrote a file cache status object to summarise important info so check timings and similar can be decided upon without needing to load a log +
- the new cache is now used across the program for all file import summary presentation +
- wrote a new subscription object to hold the new query headers and load logs as needed +
- updated subscription management to deal with the new subscription objects. it now also keeps them in memory all the time +
- wrote a fail-safe update from the old subscription objects to the new, which also saves a backup to disk, just in case of unforeseen problems in the near future +
- updated the subscription ui code to deal with all the new objects +
- updated the subscription ui to deal with asynchronous log fetching as needed +
- cleaned up some file import status code +
- moved old subscription code to a new legacy file +
- refactored subscription ui code to a new file +
- refactored and improved sub sync code +
- misc subscription cleanup +
- misc subscription ui cleanup +
- added type hints to multiple subscription locations +
- improved how missing serialisable object errors are handled at the db level +
- . +
- client api: +
- the client api now delivers 'is_inbox', 'is_local', 'is_trashed' for 'GET /get_files/file_metadata' +
- the client api's Access-Control-Allow-Headers CORS header is now '*', allowing all +
- client api version is now 12 +
- . +
- downloaders: +
- twitter retired their old api on the 1st of June, and there is unfortunately no good hydrus solution for the new one. however thanks to a user's efforts, a nice new parser for nitter, a twitter wrapper, is added in today's update. please play with it--it has three downloaders, one for a user's media, one for retweets, and one for both together--and adjust your twitter subscriptions to use the new downloader as needed. the twitter downloader is no longer included for new hydrus users +
- thanks to a user's submission, fixed the md5 hash fetching for default danbooru parsers +
- derpibooru gallery searching _should_ be fixed to use their current api +
- . +
- the rest: +
- when the client exits or gets a 'modal' maintenance popup window, all currently playing media windows will now pause +
- regrettably, due to some content merging issues that are too complicated to improve at the moment, the dupe filter will no longer show the files of processed pairs in the duplicate filter more than once per batch. you won't get a series of AB, AC, AD any more. this will return in future +
- the weird bug where double-clicking the topmost recent tags suggestion would actually remove the top two items _should_ be fixed. general selection-setting on this column should also be improved +
- middle-clicking on a parent tag in a 'write' autocomplete dropdown no longer launches a page with that invalid parent 'label' tag included--it just does the base tag. the same is true of label tags (such as 'loading...') and namespace tags +
- when changing 'expand parents on autocomplete' in the cog button on manage tags, the respective autocomplete now changes whether it displays parents +
- this is slightly complicated: a tag 'write' context (like manage tags) now presents its autocomplete tags (filtering, siblings, parents) according to the tag service of the parent panel, not the current tag service of the autocomplete. so, if you are on 'my tags' panel and switch to 'all known tags' for the a/c, you will no longer get 'all known tags' siblings and parents and so on presented if 'my tags' is not set to take them. this was sometimes causing confusion when a list showed a parent but the underlying panel did not add it on tag entry +
- to reduce blacklist confusion, when you launch the edit blacklist dialog from an edit tag import options panel, now only the 'blacklist' tab shows, the summary text is blacklist-specific, and the top intro message is improved. a separate 'whitelist' filter will be added in the near future to allow downloading of files only if they have certain tags +
- 'hard-replace siblings and parents' in _manage tags_ should now correctly remove bad siblings when they are currently pending +
- network->downloaders->manage downloader and url display now has a checkbox to make the media viewer top-right hover show unmatched urls +
- the '... elide page tab names' option now applies instantly on options dialog ok to all pages +
- added 'copy_bmp_or_file_if_not_bmpable' shortcut command to media set. it tries copy_bmp first, then copy_file if not a static image +
- fixed some edit tag filter layout to stop long intro messages making it super wide +
- fixed an issue where tag filters could accept non-whitespace-stripped entries and entries with uppercase characters +
- fixed a display typo where the 'clear orphan files' maintenance job, when set to delete orphans, was accidentally reporting (total number of thumbnails)/(number of files to delete) text in the file delete step instead of the correct (num_done/num_to_do) +
- clarified the 'reset repository' commands in review services +
- when launching an external program, the child process's environment's PATH is reset to what it was at hydrus boot (removing hydrus base dir) +
- when launching an external program from the frozen build, if some Qt/SSL specific PATH variables have been set to hydrus subdirectories by pyinstaller or otherwise, they are now removed. (this hopefully fixes issues launching some Qt programs as external file launchers) +
- added a separate requirements.txt for python 3.8, which can't handle PySide2 5.13.0 +
- updated help->about to deal better with missing mpv +
- updated windows mpv to 2020-05-31 build, api version is now 1.108 +
- updated windows sqlite to 3.32.2 +
version 399
- improvements: diff --git a/help/client_api.html b/help/client_api.html index b4bda4a1..91688f0f 100644 --- a/help/client_api.html +++ b/help/client_api.html @@ -893,6 +893,9 @@ "has_audio" : false, "num_frames" : null, "num_words" : null, + "is_inbox" : true, + "is_local" : true, + "is_trashed" : false, "known_urls" : [], "service_names_to_statuses_to_tags" : {} }, @@ -908,6 +911,9 @@ "has_audio" : true, "num_frames" : 102, "num_words" : null, + "is_inbox" : false, + "is_local" : true, + "is_trashed" : false, "known_urls" : [ "https://gelbooru.com/index.php?page=post&s=view&id=4841557", "https://img2.gelbooru.com//images/80/c8/80c8646b4a49395fb36c805f316c49a9.jpg", diff --git a/help/duplicates.html b/help/duplicates.html index 65d35671..c5479d63 100644 --- a/help/duplicates.html +++ b/help/duplicates.html @@ -17,7 +17,7 @@
-
+
Let's go to the preparation page first:
The 'similar shape' algorithm works on distance. Two files with 0 distance are likely exact matches, such as resizes of the same file or lower/higher quality jpegs, whereas those with distance 4 tend to be to be hairstyle or costume changes. You will be starting on distance 0 and not expect to ever go above 4 or 8 or so. Going too high increases the danger of being overwhelmed by false positives.
-If you are interested, the current version of this system uses a 64-bit phash to represent the image shape and a VPTree to search different files' phashes' relative hamming distance. I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons.
+If you are interested, the current version of this system uses a 64-bit phash to represent the image shape and a VPTree to search different files' phashes' relative hamming distance. I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons.
Searching for duplicates is fairly fast per file, but with a large client with hundreds of thousands of files, the total CPU time adds up. You can do a little manual searching if you like, but once you are all settled here, I recommend you hit the cog icon on the preparation page and let hydrus do this page's catch-up search work in your regular maintenance time. It'll swiftly catch up and keep you up to date without you even thinking about it.
Start searching on the 'exact match' search distance of 0. It is generally easier and more valuable to get exact duplicates out of the way first.
Once you have some files searched, you should see a potential pair count appear in the 'filtering' page.
diff --git a/help/getting_started_subscriptions.html b/help/getting_started_subscriptions.html index b452ce1f..6e9f8f31 100644 --- a/help/getting_started_subscriptions.html +++ b/help/getting_started_subscriptions.html @@ -10,16 +10,17 @@Let's say you found an artist you like. You downloaded everything of theirs from some site, but one or two pieces of new work is posted every week. You'd like to keep up with the new stuff, but you don't want to manually make a new download job every week for every single artist you like.
what are subs?
Subscriptions are a way of telling the client to regularly and quietly repeat a gallery search. You set up a number of saved queries, and the client will 'sync' with the latest files in the gallery and download anything new, just as if you were running the download yourself.
+Subscriptions only work for booru-like galleries that put the newest files first, and they only keep up with new content--once they have done their first sync, which usually gets the most recent hundred files or so, they will never reach further into the past. Getting older files, as you will see later, is a job best done with a normal download page.
Here's the dialog, which is under network->downloaders->manage subscriptions:
This is a very simple example--there is only one subscription, for safebooru. It has two 'queries' (i.e. searches to keep up with).
-It is important to note that while subscriptions can have multiple queries (even hundreds!), they generally only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get subscriptions to work on multiple sites at once, but I recommend against this as it throws off some of the internal check timing calculations.
+It is important to note that while subscriptions can have multiple queries (even hundreds!), they generally only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get around this, but I recommend against it as it throws off some of the internal check timing calculations.
Before we trip over the advanced buttons here, let's zoom in on the actual subscription:
This is a big and powerful panel! I recommend you open the screenshot up in a new browser tab, or in the actual client, so you can refer to it.
Despite all the controls, the basic idea is simple: Up top, I have selected the 'safebooru tag search' download source, and then I have added two artists--"hong_soon-jae" and "houtengeki". These two queries have their own panels for reviewing what URLs they have worked on and further customising their behaviour, but all they really are is little bits of search text. When the subscription runs, it will put the given search text into the given download source just as if you were running the regular downloader.
For the most part, all you need to do to set up a good subscription is give it a name, select the download source, and use the 'paste queries' button to paste what you want to search. Subscriptions have great default options for almost all query types, so you don't have to go any deeper than that to get started.
-Do not change the 'file limits' options until you know exactly what they do and have a good reason to alter them!
+Do not change the max number of new files options until you know exactly what they do and have a good reason to alter them!
how do subscriptions work?
Once you hit ok on the main subscription dialog, the subscription system should immediately come alive. If any queries are due for a 'check', they will perform their search and look for new files (i.e. URLs it has not seen before). Once that is finished, the file download queue will be worked through as normal. Typically, the sub will make a popup like this while it works:
@@ -29,16 +30,16 @@This can often be a nice surprise!
what makes a good subscription?
-The same rules as for downloaders apply: start slow, be hesitant, and plan for the long-term. Artist queries make great subscriptions as they don't update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that.
+The same rules as for downloaders apply: start slow, be hesitant, and plan for the long-term. Artist queries make great subscriptions as they update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that.
Series and character subscriptions are sometimes valuable, but they can be difficult to keep up with and have highly variable quality. It is not uncommon for users to only keep 15% of what a character sub produces. I do not recommend them for anything but your waifu.
-Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by way too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'.
+Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'.
If you end up subscribing to eight hundred things and get ten thousand new files a week, you made a mistake. Subscriptions are for keeping up with things you like. If you let them overwhelm you, you'll resent them.
It is a good idea to run a 'full' download for a search before you set up a subscription. As well as making sure you have the exact right query text and that you have everything ever posted (beyond the 100 files deep a sub will typically look), it saves the bulk of the work (and waiting on bandwidth) for the manual downloader, where it belongs. When a new subscription picks up off a freshly completed download queue, its initial subscription sync only takes thirty seconds since its initial URLs are those that were already processed by the manual downloader. I recommend you stack artist searches up in the manual downloader using 'no limit' file limit, and when they are all finished, select them in the list and right-click->copy queries, which will put the search texts in your clipboard, newline-separated. This list can be pasted into the subscription dialog in one go with the 'paste queries' button again!
The entire subscription system assumes the source is a typical 'newest first' booru-style search. If you dick around with some order_by:rating/random metatag, it won't work.
how often do subscriptions check?
Hydrus subscriptions use the same variable-rate checking system as its thread watchers, just on a larger timescale. If you subscribe to a busy feed, it might check for new files once a day, but if you enter an artist who rarely posts, it might only check once every month. You don't have to do anything. The fine details of this are governed by the 'checker options' button. This is one of the things you should not mess with as you start out.
If a query goes too 'slow' (typically, this means no new files for 180 days), it will be marked DEAD in the same way a thread will, and it will not be checked again. You will get a little popup when this happens. This is all editable as you get a better feel for the system--if you wish, it is completely possible to set up a sub that never dies and only checks once a year.
-I do not recommend setting up a sub that needs to check more than once a day. The system tends only to wake up a few times per day anyway, and any search that is producing that many files is probably a bad fit for a subscription. Subscriptions are for lightweight searches that are updated every now and then.
+I do not recommend setting up a sub that needs to check more than once a day. Any search that is producing that many files is probably a bad fit for a subscription. Subscriptions are for lightweight searches that are updated every now and then.
(you might like to come back to this point once you have tried subs for a week or so and want to refine your workflow)
@@ -46,7 +47,7 @@
One the edit subscription panel, the 'presentation' options let you publish files to a page. The page will have the subscription's name, just like the button makes, but it cuts out the middle-man and 'locks it in' more than the button, which will be forgotten if you restart the client. Also, if a page with that name already exists, the new files will be appended to it, just like a normal import page! I strongly recommend moving to this once you have several subs going. Make a 'page of pages' called 'subs' and put all your subscription landing pages in there, and then you can check it whenever is convenient.
If you discover your subscription workflow tends to be the same for each sub, you can also customise the publication 'label' used. If multiple subs all publish to the 'nsfw subs' label, they will all end up on the same 'nsfw subs' popup button or landing page. Sending multiple subscriptions' import streams into just one or two locations like this can be great.
You can also hide the main working popup. I don't recommend this unless you are really having a problem with it, since it is useful to have that 'active' feedback if something goes wrong.
-Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the patricians, you will know to edit your 'loud' default file import options under options->importing to behave this way as well. Efficient workflows only care about new files.
+Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the Patricians, you will know to edit your 'loud' default file import options under options->importing to behave this way as well. Efficient workflows only care about new files.
how exactly does the sync work?
Figuring out when a repeating search has 'caught up' can be a tricky problem to solve. It sounds simple, but unusual situations like 'a file got tagged late, so it inserted deeper than it ideally should in the gallery search' or 'the website changed its URL format completely, help' can cause problems. Subscriptions are automatic systems, to they tend to be a bit more careful and paranoid about problems, lest they burn 10GB on 10,000 unexpected diaperfur images.
The initial sync is simple. It does a regular search, stopping if it reaches the 'initial file limit' or the last file in the gallery, whichever comes first. The default initial file sync is 100, which is a great number for almost all situations.
@@ -54,8 +55,8 @@If the sub keeps finding apparently new URLs on a regular sync, it will stop upon hitting its 'periodic file limit', which is also usually 100. This is a safety stopgap, and usually happens when the site's URL format itself has changed, which may or may not require attention from you to figure out. If a user just went nuts and uploaded 500 new files to that tag in one day, you'll have a 'gap' in your sub sync, which you'll want to fill in with a manual download. If a sub hits its periodic file limit and thinks something like this happened, it will give you a popup explaining the situation.
Please note that subscriptions only keep up with new content. They cannot search backwards in time in order to 'fill out' a search, nor can they fill in gaps. Do not change the file limits or check times to try to make this happen. If you want to ensure complete sync with all existing content for a particular search, please use the manual downloader.
In practice, most subs only need to check the first page of a gallery since only the first two or three urls are new.
-I put character queries in my artist sub, and now things are all mixed up
-On the main subscription dialog, there are 'merge' and 'split' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button.
+I put character queries in my artist sub, and now things are all mixed up
+On the main subscription dialog, there are 'merge' and 'separate' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button on the dialog.