From 192ffc1e1bbe3503999b7128168e82e961c1f55e Mon Sep 17 00:00:00 2001 From: Paul Friederichsen Date: Thu, 23 Dec 2021 06:45:18 -0600 Subject: [PATCH] Add more help pages --- docs/adding_new_downloaders.md | 21 +++ docs/advanced.md | 98 +++++++++++ docs/advanced_parents.md | 76 +++++++++ docs/advanced_siblings.md | 79 +++++++++ docs/database_migration.md | 114 +++++++++++++ docs/duplicates.md | 230 ++++++++++++++++++++++++++ docs/getting_started_more_files.md | 96 +++++++++++ docs/getting_started_subscriptions.md | 128 ++++++++++++++ docs/launch_arguments.md | 73 ++++++++ {help => docs}/profile_example.txt | 0 docs/reducing_lag.md | 48 ++++++ mkdocs.yml | 30 +++- 12 files changed, 984 insertions(+), 9 deletions(-) create mode 100644 docs/adding_new_downloaders.md create mode 100644 docs/advanced.md create mode 100644 docs/advanced_parents.md create mode 100644 docs/advanced_siblings.md create mode 100644 docs/database_migration.md create mode 100644 docs/duplicates.md create mode 100644 docs/getting_started_more_files.md create mode 100644 docs/getting_started_subscriptions.md create mode 100644 docs/launch_arguments.md rename {help => docs}/profile_example.txt (100%) create mode 100644 docs/reducing_lag.md diff --git a/docs/adding_new_downloaders.md b/docs/adding_new_downloaders.md new file mode 100644 index 00000000..8ae6c123 --- /dev/null +++ b/docs/adding_new_downloaders.md @@ -0,0 +1,21 @@ +--- +title: adding new downloaders +--- + +# adding new downloaders + +## all downloaders are user-creatable and -shareable { id="anonymous" } + +Since the big downloader overhaul, all downloaders can be created, edited, and shared by any user. Creating one from scratch is not simple, and it takes a little technical knowledge, but importing what someone else has created is easy. + +Hydrus objects like downloaders can sometimes be shared as data encoded into png files, like this: + +![](images/easy-import-realbooru.com-search-2018.09.21.png) + +This contains all the information needed for a client to add a realbooru tag search entry to the list you select from when you start a new download or subscription. + +You can get these pngs from anyone who has experience in the downloader system. An archive is maintained [here](https://github.com/CuddleBear92/Hydrus-Presets-and-Scripts/tree/master/Downloaders). + +To 'add' the easy-import pngs to your client, hit _network->downloaders->import downloaders_. A little image-panel will appear onto which you can drag-and-drop these png files. The client will then decode and go through the png, looking for interesting new objects and automatically import and link them up without you having to do any more. Your only further input on your end is a 'does this look correct?' check right before the actual import, just to make sure there isn't some mistake or other glaring problem. + +Objects imported this way will take precedence over existing functionality, so if one of your downloaders breaks due to a site change, importing a fixed png here will overwrite the broken entries and become the new default. \ No newline at end of file diff --git a/docs/advanced.md b/docs/advanced.md new file mode 100644 index 00000000..b5d6b6ab --- /dev/null +++ b/docs/advanced.md @@ -0,0 +1,98 @@ +--- +title: general clever tricks +--- + + +!!! note "this is non-comprehensive" + I am always changing and adding little things. The best way to learn is just to look around. If you think a shortcut should probably do something, try it out! If you can't find something, let me know and I'll try to add it! + +## advanced mode { id="advanced_mode" } + +To avoid confusing clutter, several advanced menu items and buttons are hidden by default. When you are comfortable with the program, hit _help->advanced mode_ to reveal them! + +## exclude deleted files { id="exclude_deleted_files" } + +In the client's options is a checkbox to exclude deleted files. It recurs pretty much anywhere you can import, under 'import file options'. If you select this, any file you ever deleted will be excluded from all future remote searches and import operations. This can stop you from importing/downloading and filtering out the same bad files several times over. The default is off. You may wish to have it set one way most of the time, but switch it the other just for one specific import or search. + +## inputting non-english lanuages { id="ime" } + +If you typically use an IME to input Japanese or another non-english language, you may have encountered problems entering into the autocomplete tag entry control in that you need Up/Down/Enter to navigate the IME, but the autocomplete steals those key presses away to navigate the list of results. To fix this, press Insert to temporarily disable the autocomplete's key event capture. The autocomplete text box will change colour to let you know it has released its normal key capture. Use your IME to get the text you want, then hit Insert again to restore the autocomplete to normal behaviour. + +## tag display { id="tag_display" } + +If you do not like a particular tag or namespace, you can easily hide it with _services->manage tag display_: + +_This image is out of date, sorry!_ + +![](images/tag_censorship.png) + +You can exclude single tags, like as shown above, or entire namespaces (enter the colon, like 'species:'), or all namespaced tags (use ':'), or all unnamespaced tags (''). 'all known tags' will be applied to everything, as well as any repository-specific rules you set. + +A blacklist excludes whatever is listed; a whitelist excludes whatever is _not_ listed. + +This censorship is local to your client. No one else will experience your changes or know what you have censored. + +## importing and adding tags at the same time { id="importing_with_tags" } + +_Add tags before importing_ on _file->import files_ lets you give tags to the files you import _en masse_, and intelligently, using regexes that parse filename: + +![](images/gunnerkrigg_import.png) + +This should be somewhat self-explanatory to anyone familiar with regexes. I hate them, personally, but I recognise they are powerful and exactly the right tool to use in this case. [This](http://www.aivosto.com/vbtips/regex.html) is a good introduction. + +Once you are done, you'll get something neat like this: + +![](images/gunnerkrigg_page.png) + +Which you can more easily manage by collecting: + +![](images/gunnerkrigg_chapter.png) + +Collections have a small icon in the bottom left corner. Selecting them actually selects many files (see the status bar), and performing an action on them (like archiving, uploading) will do so to every file in the collection. Viewing collections fullscreen pages through their contents just like an uncollected search. + +Here is a particularly zoomed out view, after importing volume 2: + +![](images/gunnerkrigg_volume.png) + +Importing with tags is great for long-running series with well-formatted filenames, and will save you literally hours' finicky tagging. + +## tag migration { id="tag_migration" } + +!!! danger + At _some_ point I will write some better help for this system, which is powerful. Be careful with it! + +Sometimes, you may wish to move thousands or millions of tags from one place to another. These actions are now collected in one place: _services->tag migration_. + +![](images/tag_migration.png) + +It proceeds from left to right, reading data from the source and applying it to the destination with the certain action. There are multiple filters available to select which sorts of tag mappings or siblings or parents will be selected from the source. The source and destination can be the same, for instance if you wanted to delete all 'clothing:' tags from a service, you would pull all those tags and then apply the 'delete' action on the same service. + +You can import from and export to Hydrus Tag Archives (HTAs), which are external, portable .db files. In this way, you can move millions of tags between two hydrus clients, or share with a friend, or import from an HTA put together from a website scrape. + +Tag Migration is a powerful system. Be very careful with it. Do small experiments before starting large jobs, and if you intend to migrate millions of tags, make a backup of your db beforehand, just in case it goes wrong. + +This system was once much more simple, but it still had HTA support. If you wish to play around with some HTAs, there are some old user-created ones [here](https://www.mediafire.com/folder/yoy1dx6or0tnr/tag_archives). + +## custom shortcuts { id="shortcuts" } + +Once you are comfortable with manually setting tags and ratings, you may be interested in setting some shortcuts to do it quicker. Try hitting _file->shortcuts_ or clicking the keyboard icon on any media viewer window's top hover window. + +There are two kinds of shortcuts in the program--_reserved_, which have fixed names, are undeletable, and are always active in certain contexts (related to their name), and _custom_, which you create and name and edit and are only active in a media viewer when you want them to. You can redefine some simple shortcut commands, but most importantly, you can create shortcuts for adding/removing a tag or setting/unsetting a rating. + +Use the same 'keyboard' icon to set the current and default custom shortcuts. + +## finding duplicates { id="finding_duplicates" } + +_system:similar_to_ lets you run the duplicates processing page's searches manually. You can either insert the hash and hamming distance manually, or you can launch these searches automatically from the thumbnail _right-click->find similar files_ menu. For example: + +![](images/similar_gununu.png) + +## truncated/malformed file import errors { id="file_import_errors" } + +Some files, even though they seem ok in another program, will not import to hydrus. This is usually because they file has some 'truncated' or broken data, probably due to a bad upload or storage at some point in its internet history. While sophisticated external programs can usually patch the error (often rendering the bottom lines of a jpeg as grey, for instance), hydrus is not so clever. Please feel free to send or link me, hydrus developer, to these files, so I can check them out on my end and try to fix support. + +If the file is one you particularly care about, the easiest solution is to open it in photoshop or gimp and save it again. Those programs should be clever enough to parse the file's weirdness, and then make a nice clean saved file when it exports. That new file should be importable to hydrus. + +## setting a password { id="password" } + +the client offers a very simple password system, enough to keep out noobs. You can set it at _database->set a password_. It will thereafter ask for the password every time you start the program, and will not open without it. However none of the database is encrypted, and someone with enough enthusiasm or a tool and access to your computer can still very easily see what files you have. The password is mainly to stop idle snoops checking your images if you are away from your machine. \ No newline at end of file diff --git a/docs/advanced_parents.md b/docs/advanced_parents.md new file mode 100644 index 00000000..5b28c81a --- /dev/null +++ b/docs/advanced_parents.md @@ -0,0 +1,76 @@ +--- +title: tag parents +--- + +Tag parents let you automatically add a particular tag every time another tag is added. The relationship will also apply retroactively. + +## what's the problem? { id="the_problem" } + +Tags often fall into certain heirarchies. Certain tags _always_ imply certain other tags, and it is annoying and time-consuming to add them all individually every time. + +For example, whenever you tag a file with _ak-47_, you probably also want to tag it _assault rifle_, and maybe even _firearm_ as well. + +![](images/tag_parents_venn.png) + +Another time, you might tag a file _character:eddard stark_, and then also have to type in _house stark_ and then _series:game of thrones_. (you might also think _series:game of thrones_ should actually be _series:a song of ice and fire_, but that is an issue for [siblings](advanced_siblings.html)) + +Drawing more relationships would make a significantly more complicated venn diagram, so let's draw a family tree instead: + +![](images/tag_parents_got.png) + +## tag parents { id="tag_parents" } + +Let's define the child-parent relationship 'C->P' as saying that tag P is the semantic superset/superclass of tag C. **All files that have C should also have P, without exception.** When the user tries to add tag C to a file, tag P is added automatically. + +Let's expand our weapon example: + +![](images/tag_parents_firearms.png) + +In that graph, adding _ar-15_ to a file would also add _semi-automatic rifle_, _rifle_, and _firearm_. Searching for _handgun_ would return everything with _m1911_ and _smith and wesson model 10_. + +This can obviously get as complicated and autistic as you like, but be careful of being too confident--this is just a fun example, but is an AK-47 truly _always_ an assault rifle? Some people would say no, and beyond its own intellectual neatness, what is the purpose of attempting to create such a complicated and 'perfect' tree? Of course you can create any sort of parent tags on your local tags or your own tag repositories, but this sort of thing can easily lead to arguments between reasonable people. I only mean to say, as someone who does a lot of tag work, to try not to create anything 'perfect', as it usually ends up wasting time. Act from need, not toward purpose. + +## how you do it { id="how_to_do_it" } + +Go to _services->manage tag parents_: + +![](images/tag_parents_dialog.png) + +Which looks and works just like the manage tag siblings dialog. + +Note that when you hit ok, the client will look up all the files with all your added tag Cs and retroactively apply/pend the respective tag Ps if needed. This could mean thousands of tags! + +Once you have some relationships added, the parents and grandparents will show indented anywhere you 'write' tags, such as the manage tags dialog: + +![](images/tag_parents_ac_1.png) + +Hitting enter on cersei will try to add _house lannister_ and _series:game of thrones_ as well. + +![](images/tag_parents_ac_2.png) + +## remote parents { id="remote_parents" } + +Whenever you add or remove a tag parent pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that parent pair. If it is denied, only you will see it. + +## parent 'favourites' { id="parent_favourites" } + +As you use the client, you will likely make several processing workflows to archive/delete your different sorts of imports. You don't always want to go through things randomly--you might want to do some big videos for a bit, or focus on a particular character. A common search page is something like `[system:inbox, creator:blah, limit:256]`, which will show a sample of a creator in your inbox, so you can process just that creator. This is easy to set up and save in your favourite searches and quick to run, so you can load it up, do some archive/delete, and then dismiss it without too much hassle. + +But what happens if you want to search for multiple creators? You might be tempted to make a large OR search predicate, like `creator:aaa OR creator:bbb OR creator:ccc OR creator:ddd`, of all your favourite creators so you can process them together as a 'premium' group. But if you want to add or remove a creator from that long OR, it can be cumbersome. And OR searches can just run slow sometimes. One answer is to use the new tag parents tools to apply a 'favourite' parent on all the artists and then search for that favourite. + +Let's assume you want to search bunch of 'creator' tags on the PTR. What you will do is: + +* Create a new 'local tag service' in _manage services_ called 'my parent favourites'. This will hold our subjective parents without uploading anything to the PTR. +* Go to _tags->manage where tag siblings and parents apply_ and add 'my parent favourites' as the top priority for parents, leaving 'PTR' as second priority. +* Under _tags->manage tag parents_, on your 'my parent favourites' service, add: + + * `creator:aaa->favourite:aesthetic art` + * `creator:bbb->favourite:aesthetic art` + * `creator:ccc->favourite:aesthetic art` + * `creator:ddd->favourite:aesthetic art` + + Watch/wait a few seconds for the parents to apply across the PTR for those creator tags. + +* Then save a new favourite search of `[system:inbox, favourite:aesthetic art, limit:256]`. This search will deliver results with any of the child 'creator' tags, just like a big OR search, and real fast! + +If you want to add or remove any creators to the 'aesthetic art' group, you can simply go back to _tags->manage tag parents_, and it will apply everywhere. You can create more umbrella/group tags if you like (and not just creators--think about clothing, or certain characters), and also use them in regular searches when you just want to browse some cool files. \ No newline at end of file diff --git a/docs/advanced_siblings.md b/docs/advanced_siblings.md new file mode 100644 index 00000000..f9d52999 --- /dev/null +++ b/docs/advanced_siblings.md @@ -0,0 +1,79 @@ +--- +title: tag siblings +--- + +Tag siblings let you replace a bad tag with a better tag. + +## what's the problem? { id="the_problem" } + +Reasonable people often use different words for the same things. + +A great example is in Japanese names, which are natively written surname first. `character:ayanami rei` and `character:rei ayanami` have the same meaning, but different users will use one, or the other, or even both. + +Other examples are tiny syntactic changes, common misspellings, and unique acronyms: + +* _smiling_ and _smile_ +* _staring at camera_ and _looking at viewer_ +* _pokemon_ and _pokémon_ +* _jersualem_ and _jerusalem_ +* _lotr_ and _series:the lord of the rings_ +* _marimite_ and _series:maria-sama ga miteru_ +* _ishygddt_ and _i sure hope you guys don't do that_ + +A particular repository may have a preferred standard, but it is not easy to guarantee that all the users will know exactly which tag to upload or search for. + +After some time, you get this: + +![](images/tag_siblings_venn_1.png) + +Without continual intervention by janitors or other experienced users to make sure y⊇x (i.e. making the yellow circle entirely overlap the blue by manually giving y to everything with x), searches can only return x (blue circle) or y (yellow circle) or x∩y (the lens-shaped overlap). What we really want is x∪y (both circles). + +So, how do we fix this problem? + +## tag siblings { id="tag_siblings" } + +Let's define a relationship, **A->B**, that means that any time we would normally see or use tag A or tag B, we will instead only get tag B: + +![](images/tag_siblings_venn_2.png) + +Note that this relationship implies that B is in some way 'better' than A. + +## ok, I understand; now confuse me { id="more_complicated" } + +This relationship is transitive, which means as well as saying A->B, you can also say B->C, which implies A->C and B->C. + +![](images/tag_siblings_usa.png) + +You can also have an A->C and B->C that does not include A->B. + +![](images/tag_siblings_what_is_a_man.png) + +The outcome of these two arrangements is the same (everything ends up as C), but the underlying semantics are a little different if you ever want to edit them. + +Many complicated arrangements are possible: + +![](images/tag_siblings_yo_dawg.png) + +Note that if you say A->B, you cannot say A->C; the left-hand side can only go to one. The right-hand side can receive many. The client will stop you from constructing loops. + +## how you do it { id="how_to_do_it" } + +Just open _services->manage tag siblings_, and add a few. + +![](images/tag_siblings_dialog.png) + +The client will automatically collapse the tagspace to whatever you set. It'll even work with autocomplete, like so: + +![](images/tag_siblings_rei.png) + +Please note that siblings' autocomplete counts may be slightly inaccurate, as unioning the _count_ is difficult to quickly estimate. + +The client will not collapse siblings anywhere you 'write' tags, such as the manage tags dialog. You will be able to add or remove A as normal, but it will be written in some form of "A (B)" to let you know that, ultimately, the tag will end up displaying in the main gui as B: + +![](images/tag_siblings_ac_write.png) + +Although the client may present A as B, it will secretly remember A! You can remove the association A->B, and everything will return to how it was. **No information is lost at any point.** + +## remote siblings { id="remote_siblings" } + +Whenever you add or remove a tag sibling pair to a tag repository, you will have to supply a reason (like when you petition a tag). A janitor will review this petition, and will approve or deny it. If it is approved, all users who synchronise with that tag repository will gain that sibling pair. If it is denied, only you will see it. \ No newline at end of file diff --git a/docs/database_migration.md b/docs/database_migration.md new file mode 100644 index 00000000..6d7e7a81 --- /dev/null +++ b/docs/database_migration.md @@ -0,0 +1,114 @@ +--- +title: database migration +--- + +## the hydrus database { id="intro" } + +A hydrus client consists of three components: + +1. **the software installation** + + This is the part that comes with the installer or extract release, with the executable and dlls and a handful of resource folders. It doesn't store any of your settings--it just knows how to present a database as a nice application. If you just run the client executable straight, it looks in its 'db' subdirectory for a database, and if one is not found, it creates a new one. If it sees a database running at a lower version than itself, it will update the database before booting it. + + It doesn't really matter where you put this. An SSD will load it marginally quicker the first time, but you probably won't notice. If you run it without command-line parameters, it will try to write to its own directory (to create the initial database), so if you mean to run it like that, it should not be in a protected place like _Program Files_. + +2. **the actual database** + + The client stores all its preferences and current state and knowledge _about_ files--like file size and resolution, tags, ratings, inbox status, and so on and so on--in a handful of SQLite database files, defaulting to _install_dir/db_. Depending on the size of your client, these might total 1MB in size or be as much as 10GB. + + In order to perform a search or to fetch or process tags, the client has to interact with these files in many small bursts, which means it is best if these files are on a drive with low latency. An SSD is ideal, but a regularly-defragged HDD with a reasonable amount of free space also works well. + +3. **your media files** + + All of your jpegs and webms and so on (and their thumbnails) are stored in a single complicated directory that is by default at _install\_dir/db/client\_files_. All the files are named by their hash and stored in efficient hash-based subdirectories. In general, it is not navigable by humans, but it works very well for the fast access from a giant pool of files the client needs to do to manage your media. + + Thumbnails tend to be fetched dozens at a time, so it is, again, ideal if they are stored on an SSD. Your regular media files--which on many clients total hundreds of GB--are usually fetched one at a time for human consumption and do not benefit from the expensive low-latency of an SSD. They are best stored on a cheap HDD, and, if desired, also work well across a network file system. + + +## these components can be put on different drives { id="different_drives" } + +Although an initial install will keep these parts together, it is possible to, say, run the database on a fast drive but keep your media in cheap slow storage. This is an excellent arrangement that works for many users. And if you have a very large collection, you can even spread your files across multiple drives. It is not very technically difficult, but I do not recommend it for new users. + +Backing such an arrangement up is obviously more complicated, and the internal client backup is not sophisticated enough to capture everything, so I recommend you figure out a broader solution with a third-party backup program like FreeFileSync. + +## pulling your media apart { id="pulling_media_apart" } + +!!! danger + **As always, I recommend creating a backup before you try any of this, just in case it goes wrong.** + +If you would like to move your files and thumbnails to new locations, I generally recommend you not move their folders around yourself--the database has an internal knowledge of where it thinks its file and thumbnail folders are, and if you move them while it is closed, it will become confused and you will have to manually relocate what is missing on the next boot via a repair dialog. This is not impossible to figure out, but if the program's 'client files' folder confuses you at all, I'd recommend you stay away. Instead, you can simply do it through the gui: + +Go _database->migrate database_, giving you this dialog: + +![](images/db_migration.png) + +This is an image from my old laptop's client. At that time, I had moved the main database and its files out of the install directory but otherwise kept everything together. Your situation may be simpler or more complicated. + +To move your files somewhere else, add the new location, empty/remove the old location, and then click 'move files now'. + +**Portable** means that the path is beneath the main db dir and so is stored as a relative path. Portable paths will still function if the database changes location between boots (for instance, if you run the client from a USB drive and it mounts under a different location). + +**Weight** means the relative amount of media you would like to store in that location. It only matters if you are spreading your files across multiple locations. If location A has a weight of 1 and B has a weight of 2, A will get approximately one third of your files and B will get approximately two thirds. + +The operations on this dialog are simple and atomic--at no point is your db ever invalid. Once you have the locations and ideal usage set how you like, hit the 'move files now' button to actually shuffle your files around. It will take some time to finish, but you can pause and resume it later if the job is large or you want to undo or alter something. + +If you decide to move your actual database, the program will have to shut down first. Before you boot up again, you will have to create a new program shortcut: + +## informing the software that the database is not in the default location { id="launch_parameter" } + +A straight call to the client executable will look for a database in _install_dir/db_. If one is not found, it will create one. So, if you move your database and then try to run the client again, it will try to create a new empty database in the previous location! + +So, pass it a -d or --db_dir command line argument, like so: + +* `client -d="D:\\media\\my\_hydrus\_database"` +* _--or--_ +* `client --db_dir="G:\\misc documents\\New Folder (3)\\DO NOT ENTER"` +* _--or, for macOS--_ +* `open -n -a "Hydrus Network.app" --args -d="/path/to/db"` + +And it will instead use the given path. If no database is found, it will similarly create a new empty one at that location. You can use any path that is valid in your system, but I would not advise using network locations and so on, as the database works best with some clever device locking calls these interfaces may not provide. + +Rather than typing the path out in a terminal every time you want to launch your external database, create a new shortcut with the argument in. Something like this, which is from my main development computer and tests that a fresh default install will run an existing database ok: + +![](images/db_migration_shortcut.png) + +Note that an install with an 'external' database no longer needs access to write to its own path, so you can store it anywhere you like, including protected read-only locations (e.g. in 'Program Files'). If you do move it, just double-check your shortcuts are still good and you are done. + +## finally { id="finally" } + +If your database now lives in one or more new locations, make sure to update your backup routine to follow them! + +## moving to an SSD { id="to_an_ssd" } + +As an example, let's say you started using the hydrus client on your HDD, and now you have an SSD available and would like to move your thumbnails and main install to that SSD to speed up the client. Your database will be valid and functional at every stage of this, and it can all be undone. The basic steps are: + +1. Move your 'fast' files to the fast location. +2. Move your 'slow' files out of the main install directory. +3. Move the install and db itself to the fast location and update shortcuts. + +Specifically: + +* Update your backup if you maintain one. +* Create an empty folder on your HDD that is outside of your current install folder. Call it 'hydrus_files' or similar. +* Create two empty folders on your SSD with names like 'hydrus\_db' and 'hydrus\_thumbnails'. + +* Set the 'thumbnail location override' to 'hydrus_thumbnails'. You should get that new location in the list, currently empty but prepared to take all your thumbs. +* Hit 'move files now' to actually move the thumbnails. Since this involves moving a lot of individual files from a high-latency source, it will take a long time to finish. The hydrus client may hang periodically as it works, but you can just leave it to work on its own--it will get there in the end. You can also watch it do its disk work under Task Manager. + +* Now hit 'add location' and select your new 'hydrus\_files'. 'hydrus\_files' should be added and willing to take 50% of the files. +* Select the old location (probably 'install\_dir/db/client\_files') and hit 'decrease weight' until it has weight 0 and you are prompted to remove it completely. 'hydrus_files' should now be willing to take all the files from the old location. +* Hit 'move files now' again to make this happen. This should be fast since it is just moving a bunch of folders across the same partition. + +* With everything now 'non-portable' and hence decoupled from the db, you can now easily migrate the install and db to 'hydrus_db' simply by shutting the client down and moving the install folder in a file explorer. +* Update your shortcut to the new client.exe location and try to boot. + +* Update your backup scheme to match your new locations. +* Enjoy a much faster client. + +You should now have _something_ like this: + +![](images/db_migration_example.png) + +## p.s. running multiple clients { id="multiple_clients" } + +Since you now know how to tell the software about an external database, you can, if you like, run multiple clients from the same install (and if you previously had multiple install folders, now you can now just use the one). Just make multiple shortcuts to the same client executable but with different database directories. They can run at the same time. You'll save yourself a little memory and update-hassle. I do this on my laptop client to run a regular client for my media and a separate 'admin' client to do PTR petitions and so on. \ No newline at end of file diff --git a/docs/duplicates.md b/docs/duplicates.md new file mode 100644 index 00000000..c7ed2d71 --- /dev/null +++ b/docs/duplicates.md @@ -0,0 +1,230 @@ +--- +title: duplicates +--- + +# duplicates { id="intro" } + +As files are shared on the internet, they are often resized, cropped, converted to a different format, altered by the original or a new artist, or turned into a template and reinterpreted over and over and over. Even if you have a very restrictive importing workflow, your client is almost certainly going to get some **duplicates**. Some will be interesting alternate versions that you want to keep, and others will be thumbnails and other low-quality garbage you accidentally imported and would rather delete. Along the way, it would be nice to merge your ratings and tags to the better files so you don't lose any work. + +Finding and processing duplicates within a large collection is impossible to do by hand, so I have written a system to do the heavy lifting for you. It currently works on still images, but an extension for gifs and video is planned. + +Hydrus finds _potential_ duplicates using a search algorithm that compares images by their shape. Once these pairs of potentials are found, they are presented to you through a filter like the archive/delete filter to determine their exact relationship and if you want to make a further action, such as deleting the 'worse' file of a pair. All of your decisions build up in the database to form logically consistent groups of duplicates and 'alternate' relationships that can be used to infer future information. For instance, if you say that file A is a duplicate of B and B is a duplicate of C, A and C are automatically recognised as duplicates as well. + +This all starts on-- + +## the duplicates processing page { id="duplicates_page" } + +On the normal 'new page' selection window, hit _special->duplicates processing_. This will open this page: + +![](images/dupe_management.png) + +Let's go to the preparation page first: + +![](images/dupe_preparation.png) + +The 'similar shape' algorithm works on _distance_. Two files with 0 distance are likely exact matches, such as resizes of the same file or lower/higher quality jpegs, whereas those with distance 4 tend to be to be hairstyle or costume changes. You will be starting on distance 0 and not expect to ever go above 4 or 8 or so. Going too high increases the danger of being overwhelmed by false positives. + +If you are interested, the current version of this system uses a 64-bit [phash](https://jenssegers.com/perceptual-image-hashes) to represent the image shape and a [VPTree](https://en.wikipedia.org/wiki/VP-tree) to search different files' phashes' relative [hamming distance](https://en.wikipedia.org/wiki/Hamming_distance). I expect to extend it in future with multiple phash generation (flips, rotations, and 'interesting' image crops and video frames) and most-common colour comparisons. + +Searching for duplicates is fairly fast per file, but with a large client with hundreds of thousands of files, the total CPU time adds up. You can do a little manual searching if you like, but once you are all settled here, I recommend you hit the cog icon on the preparation page and let hydrus do this page's catch-up search work in your regular maintenance time. It'll swiftly catch up and keep you up to date without you even thinking about it. + +Start searching on the 'exact match' search distance of 0. It is generally easier and more valuable to get exact duplicates out of the way first. + +Once you have some files searched, you should see a potential pair count appear in the 'filtering' page. + +## the filtering page { id="duplicate_filtering_page" } + +_Processing duplicates can be real trudge-work if you do not set up a workflow you enjoy. It is a little slower than the archive/delete filter, and sometimes takes a bit more cognitive work. For many users, it is a good task to do while listening to a podcast or having a video going on another screen._ + +If you have a client with tens of thousands of files, you will likely have thousands of potential pairs. This can be intimidating, but do not worry--due to the A, B, C logical inferrences as above, you will not have to go through every single one. The more information you put into the system, the faster the number will drop. + +The filter has a regular file search interface attached. As you can see, it defaults to _system:everything_, but you can limit what files you will be working on simply by adding new search predicates. You might like to only work on files in your archive (i.e. that you know you care about to begin with), for instance. You can choose whether both files of the pair should match the search, or just one. 'creator:' tags work very well at cutting the search domain to something more manageable and consistent--try your favourite creator! + +If you would like an example from the current search domain, hit the 'show some random potential pairs' button, and it will show two or more files that seem related. It is often interesting and surprising to see what it finds! The action buttons below allow for quick processing of these pairs and groups when convenient (particularly for large cg sets with 100+ alternates), but I recommend you leave these alone until you know the system better. + +When you are ready, launch the filter. + +## the duplicates filter { id="duplicates_filter" } + +_We have not set up your duplicate 'merge' options yet, so do not get too into this. For this first time, just poke around, make some pretend choices, and then cancel out and choose to forget them._ + +![](images/dupe_filter.png) + +Like the archive/delete filter, this uses quick mouse-clicks, keyboard shortcuts, or button clicks to action pairs. It presents two files at a time, labelled A and B, which you can quickly switch between just as in the normal media viewer. As soon as you action them, the next pair is shown. The two files will have their current zoom-size locked so they stay the same size (and in the same position) as you switch between them. Scroll your mouse wheel a couple of times and see if any obvious differences stand out. + +Please note the hydrus media viewer does not currently work well with large resolutions at high zoom (it gets laggy and may have memory issues). Don't zoom in to 1600% and try to look at jpeg artifact differences on very large files, as this is simply not well supported yet. + +The hover window on the right also presents a number of 'comparison statements' to help you make your decision. Green statements mean this current file is probably 'better', and red the opposite. Larger, older, higher-quality, more-tagged files are generally considered better. These statements have scores associated with them (which you can edit in _file->options->duplicates_), and the file of the pair with the highest score is presented first. If the files are duplicates, you can _generally_ assume the first file you see, the 'A', is the better, particularly if there are several green statements. + +The filter will need to occasionally checkpoint, saving the decisions so far to the database, before it can fetch the next batch. This allows it to apply inferred information from your current batch and reduce your pending count faster before serving up the next set. It will present you with a quick interstitial 'confirm/back' dialog just to let you know. This happens more often as the potential count decreases. + +## the decisions to make { id="duplicates_decisions" } + +There are three ways a file can be related to another in the current duplicates system: duplicates, alternates, or false positive (not related). + +**False positive (not related)** is the easiest. You will not see completely unrelated pairs presented very often in the filter, particularly at low search distances, but if the shape of face and hair and clothing happen to line up (or geometric shapes, often), the search system may make a false positive match. In this case, just click 'they are not related'. + +**Alternate** relations are files that are not duplicates but obviously related in some way. Perhaps a costume change or a recolour. Hydrus does not have rich alternate support yet (but it is planned, and highly requested), so this relationship is mostly a 'holding area' for files that we will revisit for further processing in the future. + +**Duplicate** files are of **the exact same thing**. They may be different resolutions, file formats, encoding quality, or one might even have watermark, but they are fundamentally different views on the exact same art. As you can see with the buttons, you can select one file as the 'better' or say they are about the same. If the files are basically the same, there is no point stressing about which is 0.2% better--just click 'they are the same'. For better/worse pairs, you might have reason to keep both, but most of the time I recommend you delete the worse. + +You can customise the shortcuts under _file->shortcuts->duplicate_filter_. The defaults are: + +* Left-click or space: **this is better, delete the other**. + +* Right-click: **they are related alternates**. + +* Middle-click: **Go back one decision.** + +* Enter/Escape: **Stop filtering.** + + +## merging metadata { id="duplicates_merging" } + +If two duplicates have different metadata like tags or archive status, you probably want to merge them. Cancel out of the filter and click the 'edit default duplicate metadata merge options' button: + +![](images/dupe_merge_options.png) + +By default, these options are fairly empty. You will have to set up what you want based on your services and preferences. Setting a simple 'copy all tags' is generally a good idea, and like/dislike ratings also often make sense. The settings for better and same quality should probably be similar, but it depends on your situation. + +If you choose the 'custom action' in the duplicate filter, you will be presented with a fresh 'edit duplicate merge options' panel for the action you select and can customise the merge specifically for that choice. ('favourite' options will come here in the future!) + +Once you are all set up here, you can dive into the duplicate filter. Please let me know how you get on with it! + +## what now? { id="future" } + +The duplicate system is still incomplete. Now the db side is solid, the UI needs to catch up. Future versions will show duplicate information on thumbnails and the media viewer and allow quick-navigation to a file's duplicates and alternates. + +For now, if you wish to see a file's duplicates, right-click it and select _file relationships_. You can review all its current duplicates, open them in a new page, appoint the new 'best file' of a duplicate group, and even mass-action selections of thumbnails. + +You can also search for files based on the number of file relations they have (including when setting the search domain of the duplicate filter!) using _system:file relationships_. You can also search for best/not best files of groups, which makes it easy, for instance, to find all the spare duplicate files if you decide you no longer want to keep them. + +I expect future versions of the system to also auto-resolve easy duplicate pairs, such as clearing out pixel-for-pixel png versions of jpgs. + +## game cgs { id="game_cgs" } + +If you import a lot of game CGs, which frequently have dozens or hundreds of alternates, I recommend you set them as alternates by selecting them all and setting the status through the thumbnail right-click menu. The duplicate filter, being limited to pairs, needs to compare all new members of an alternate group to all other members once to verify they are not duplicates. This is not a big deal for alternates with three or four members, but game CGs provide an overwhelming edge case. Setting a group of thumbnails as alternate 'fixes' their alternate status immediately, discounting the possibility of any internate duplicates, and provides an easy way out of this situation. + +## more information and examples { id="duplicates_examples" } + +### better/worse { id="duplicates_examples_better_worse" } + +Which of two files is better? Here are some common reasons: + +* higher resolution +* better image quality +* png over jpg for screenshots +* jpg over png for busy images +* jpg over png for pixel-for-pixel duplicates +* a better crop +* no watermark or site-frame or undesired blemish +* has been tagged by other people, so is likely to be the more 'popular' + +However these are not hard rules--sometimes a file has a larger resolution or filesize due to a bad upscaling or encoding decision by the person who 'reinterpreted' it. You really have to look at it and decide for yourself. + +Here is a good example of a better/worse pair: + +[![](images/dupe_better_1.png)](images/dupe_better_1.png) [![](images/dupe_better_2.jpg)](images/dupe_better_2.jpg) + +The first image is better because it is a png (pixel-perfect pngs are always better than jpgs for screenshots of applications--note how obvious the jpg's encoding artifacts are on the flat colour background) and it has a slightly higher (original) resolution, making it less blurry. I presume the second went through some FunnyJunk-tier trash meme site to get automatically cropped to 960px height and converted to the significantly smaller jpeg. Whatever happened, let's drop the second and keep the first. + +When both files are jpgs, differences in quality are very common and often significant: + +[![](images/dupes_better_sg_a.jpg)](images/dupes_better_sg_a.jpg) [![](images/dupes_better_sg_b.jpg)](images/dupes_better_sg_b.jpg) + +Again, this is mostly due to some online service resizing and lowering quality to ease on their bandwidth costs. There is usually no reason to keep the lower quality version. + +### same quality duplicates { id="duplicates_examples_same" } + +When are two files the same quality? A good rule of thumb is if you scroll between them and see no obvious differences, and the comparison statements do not suggest anything significant, just set them as same quality. + +Here are two same quality duplicates: + +[![](images/dupe_exact_match_1.png)](images/dupe_exact_match_1.png) [![](images/dupe_exact_match_2.png)](images/dupe_exact_match_2.png) + +There is no obvious different between those two. The filesize is significantly different, so I suspect the smaller is a lossless png optimisation, but in the grand scheme of things, that doesn't matter so much. Many of the big content providers--Facebook, Google, Cloudflare--automatically 'optimise' the data that goes through their networks in order to save bandwidth. Although jpegs are often a slaughterhouse, with pngs it is usually harmless. + +Given the filesize, you might decide that these are actually a better/worse pair--but if the larger image had tags and was the 'canonical' version on most boorus, the decision might not be so clear. You can choose better/worse and delete one randomly, but sometimes you may just want to keep both without a firm decision on which is best, so just set 'same quality' and move on. Your time is more valuable than a few dozen KB. + +Sometimes, you will see pixel-for-pixel duplicate jpegs of very slightly different size, such as 787KB vs 779KB. The smaller of these is usually an exact duplicate that has had its internal metadata (e.g. EXIF tags) stripped by a program or website CDN. They are same quality unless you have a strong opinion on whether having internal metadata in a file is useful. + +### alternates { id="duplicates_examples_alternates" } + +As I wrote above, hydrus's alternates system in not yet properly ready. It is important to have a basic 'alternates' relationship for now, but it is a holding area until we have a workflow to apply 'WIP'- or 'recolour'-type labels and present that information nicely in the media viewer. + +Alternates are not of exactly the same thing, but one is variant of the other or they are both descended from a common original. The precise definition is up to you, but it generally means something like: + +* the files are recolours +* the files are alternate versions of the same image produced by the same or different artists (e.g. clean/messy or with/without hair ribbon) +* iterations on a close template +* different versions of a file's progress, such as the steps from the initial draft sketch to a final shaded version + +Here are some recolours of the same image: + +[![](images/dupe_alternates_recolours.png)](images/dupe_alternates_recolours.png) + +And some WIP: + +[![](images/dupe_alternates_progress.png)](images/dupe_alternates_progress.png) + +And a costume change: + +[![](images/dupe_alternates_costume.png)](images/dupe_alternates_costume.png) + +None of these are duplicates, but they are obviously related. The duplicate search will notice they are similar, so we should let the client know they are 'alternate'. + +Here's a subtler case: + +[![](images/dupe_alternate_boxer_a.jpg)](images/dupe_alternate_boxer_a.jpg) [![](images/dupe_alternate_boxer_b.jpg)](images/dupe_alternate_boxer_b.jpg) + +These two files are very similar, but try opening both in separate tabs and then flicking back and forth: the second's glove-string is further into the mouth and has improved chin shading, a more refined eye shape, and shaved pubic hair. It is simple to spot these differences in the client's duplicate filter when you scroll back and forth. + +I believe the second is an improvement on the first by the same artist, so it is a WIP alternate. You might also consider it a 'better' improvement. + +Here are three files you might or might not consider to be alternates: + +[![](images/dupe_alternate_1.jpg)](images/dupe_alternate_1.jpg) + +[![](images/dupe_alternate_2.jpg)](images/dupe_alternate_2.jpg) + +[![](images/dupe_alternate_3.jpg)](images/dupe_alternate_3.jpg) + +These are all based on the same template--which is why the dupe filter found them--but they are not so closely related as those above, and the last one is joking about a different ideology entirely and might deserve to be in its own group. Ultimately, you might prefer just to give them some shared tag and consider them not alternates _per se_. + +### not related/false positive { id="duplicates_examples_false_positive" } + +Here are two files that match false positively: + +[![](images/dupe_not_dupes_1.png)](images/dupe_not_dupes_1.png) + +[![](images/dupe_not_dupes_2.jpg)](images/dupe_not_dupes_2.jpg) + +Despite their similar shape, they are neither duplicates nor of even the same topic. The only commonality is the medium. I would not consider them close enough to be alternates--just adding something like 'screenshot' and 'imageboard' as tags to both is probably the closest connection they have. + +Recording the 'false positive' relationship is important to make sure the comparison does not come up again in the duplicate filter. + +The incidence of false positives increases as you broaden the search distance--the less precise your search, the less likely it is to be correct. At distance 14, these files all match, but uselessly: + +[![](images/dupe_garbage.png)](images/dupe_garbage.png) + + +## the duplicates system { id="duplicates_advanced" } + +_(advanced nonsense, you can skip this section. tl;dr: duplicate file groups keep track of their best quality file, sometimes called the King)_ + +Hydrus achieves duplicate transitivity by treating duplicate files as groups. Although you action pairs, if you set (A duplicate B), that creates a group (A,B). Subsequently setting (B duplicate C) extends the group to be (A,B,C), and so (A duplicate C) is transitively implied. + +The first version of the duplicate system attempted to record better/worse/same information for all files in a virtual duplicate group, but this proved very complicated, workflow-heavy, and not particularly useful. The new system instead appoints a single _King_ as the best file of a group. All other files in the group are beneath the King and have no other relationship data retained. + +![](images/dupe_dupe_group_diagram.png) + +This King represents the group in the duplicate filter (and in potential pairs, which are actually recorded between duplicate media groups--even if most of them at the outset only have one member). If the other file in a pair is considered better, it becomes the new King, but if it is worse or equal, it [merges into the other members](images/dupe_simple_merge.png). When two Kings are compared, [whole groups can merge](images/dupe_group_merge.png)! + +Alternates are stored in a similar way, except [the members are duplicate groups](images/dupe_alternate_group_diagram.png) rather than individual files and they have no significant internal relationship metadata yet. If α, β, and γ are duplicate groups that each have one or more files, then setting (α alt β) and (β alt γ) creates an alternate group (α,β,γ), with the caveat that α and γ will still be sent to the duplicate filter once just to check they are not duplicates by chance. The specific file members of these groups, A, B, C and so on, inherit the relationships of their parent groups when you right-click on their thumbnails. + +False positive relationships are stored between pairs of alternate groups, so they apply transitively between all the files of either side's alternate group. If (α alt β) and (ψ alt ω) and you apply (α fp ψ), then (α fp ω), (β fp ψ), and (β fp ω) are all transitively implied. + +??? example "More examples" + [![](images/dupe_whole_system_diagram.png "Some fun.")](images/dupe_whole_system_diagram.png) + [![](images/dupe_whole_system_simple_diagram.png "And simpler.")](images/dupe_whole_system_simple_diagram.png) + diff --git a/docs/getting_started_more_files.md b/docs/getting_started_more_files.md new file mode 100644 index 00000000..c31311e5 --- /dev/null +++ b/docs/getting_started_more_files.md @@ -0,0 +1,96 @@ +--- +title: more files +--- + +# more getting started with files + +## searching with wildcards { id="wildcards" } + +The autocomplete tag dropdown supports wildcard searching with `*`. + +![](images/wildcard_gelion.png) + +The `*` will match any number of characters. Every normal autocomplete search has a secret `*` on the end that you don't see, which is how full words get matched from you only typing in a few letters. + +This is useful when you can only remember part of a word, or can't spell part of it. You can put `*` characters anywhere, but you should experiment to get used to the exact way these searches work. Some results can be surprising! + +![](images/wildcard_vage.png) + +You can select the special predicate inserted at the top of your autocomplete results (the highlighted `*gelion` and `*va*ge*` above). **It will return all files that match that wildcard,** i.e. every file for every other tag in the dropdown list. + +This is particularly useful if you have a number of files with commonly structured over-informationed tags, like this: + +![](images/wildcard_cool_pic.png) + +In this case, selecting the `title:cool pic*` predicate will return all three images in the same search, where you can conveniently give them some more-easily searched tags like `series:cool pic` and `page:1`, `page:2`, `page:3`. + +## more searching + +Let's look at the tag autocomplete dropdown again: + +![](images/ac_dropdown.png) + +* **favourite searches star** + + Once you get experience with the client, have a play with this. Rather than leaving common search pages open, save them in here and load them up as needed. You will keep your client lightweight and save time. + +* **include current/pending tags** + + Turn these on and off to control whether tag _search predicates_ apply to tags the exist, or limit just to those pending to be uploaded to a tag repository. Just searching 'pending' tags is useful if you want to scan what you have pending to go up to the PTR--just turn off 'current' tags and search `system:num tags > 0`. + +* **searching immediately** + + This controls whether a change to the search tags will instantly run the new search and get new results. Turning this off is helpful if you want to add, remove, or replace several heavy search terms in a row without getting UI lag. + +* **OR** + + You only see this if you have 'advanced mode' on. It is an experimental module. Have a play with it--it lets you enter some pretty complicated tags! + +* **file/tag domains** + + By default, you will search in 'my files' and 'all known tags' domain. This is the intersection of your local media files (on your hard disk) and the union of all known tag searches. If you search for `character:samus aran`, then you will get file results from your 'my files' domain that have `character:samus aran` in any tag service. For most purposes, this search domain is fine, but as you use the client more, you may want to access different search domains. + + For instance, if you change the file domain to 'trash', then you will instead get files that are in your trash. Setting the tag domain to 'my tags' will ignore other tag services (e.g. the PTR) for all tag search predicates, so a `system:num_tags` or a `character:samus aran` will only look 'my tags'. + + Turning on 'advanced mode' gives access to more search domains. Some of them are subtly complicated and only useful for clever jobs--most of the time, you still want 'my files' and 'all known tags'. + + +## sorting with system limit + +If you add system:limit to a search, the client will consider what that page's file sort currently is. If it is simple enough--something like file size or import time--then it will sort your results before they come back and clip the limit according to that sort, getting the n 'largest file size' or 'newest imports' and so on. This can be a great way to set up a lightweight filtering page for 'the 256 biggest videos in my inbox'. + +If you change the sort, hydrus will not refresh the search, it'll just re-sort the n files you have. Hit F5 to refresh the search with a new sort. + +Not all sorts are supported. Anything complicated like tag sort will result in a random sample instead. + +## exporting and uploading { id="intro" } + +There are many ways to export files from the client: + +* **drag and drop** + + Just dragging from the thumbnail view will export (copy) all the selected files to wherever you drop them. + + The files will be named by their ugly hexadecimal [hash](faq.html#hashes), which is how they are stored inside the database. + + If you use this to open a file inside an image editing program, remember to go 'save as' and give it a new filename! The client does not expect files inside its db directory to change. + +* **export dialog** + + Right clicking some files and selecting _share->export->files_ will open this dialog: + + ![](images/export.png) + + Which lets you export the selected files with custom filenames. It will initialise trying to export the files named by their hashes, but once you are comfortable with tags, you'll be able to generate much cleverer and prettier filenames. + +* **share->copy->files** + + This will copy the files themselves to your clipboard. You can then paste them wherever you like, just as with normal files. They will have their hashes for filenames. + + This is a very quick operation. It can also be triggered by hitting Ctrl+C. + +* **share->copy->hashes** + + This will copy the files' unique identifiers to your clipboard, in hexadecimal. + + You will not have to do this often. It is best when you want to identify a number of files to someone else without having to send them the actual files. diff --git a/docs/getting_started_subscriptions.md b/docs/getting_started_subscriptions.md new file mode 100644 index 00000000..388ccfe7 --- /dev/null +++ b/docs/getting_started_subscriptions.md @@ -0,0 +1,128 @@ +--- +title: subscriptions +--- + +# getting started with subscriptions + +Do not try to create a subscription until you are comfortable with a normal gallery download page! Go [here](getting_started_downloading.html). + +Let's say you found an artist you like. You downloaded everything of theirs from some site, but one or two pieces of new work is posted every week. You'd like to keep up with the new stuff, but you don't want to manually make a new download job every week for every single artist you like. + +## what are subs? { id="intro" } + +Subscriptions are a way of telling the client to regularly and quietly repeat a gallery search. You set up a number of saved queries, and the client will 'sync' with the latest files in the gallery and download anything new, just as if you were running the download yourself. + +Subscriptions only work for booru-like galleries that put the newest files first, and they only keep up with new content--once they have done their first sync, which usually gets the most recent hundred files or so, they will never reach further into the past. Getting older files, as you will see later, is a job best done with a normal download page. + +Here's the dialog, which is under _network->downloaders->manage subscriptions_: + +![](images/subscriptions_edit_subscriptions.png) + +This is a very simple example--there is only one subscription, for safebooru. It has two 'queries' (i.e. searches to keep up with). + +It is important to note that while subscriptions can have multiple queries (even hundreds!), they _generally_ only work on one site. Expect to create one subscription for safebooru, one for artstation, one for paheal, and so on for every site you care about. Advanced users may be able to think of ways to get around this, but I recommend against it as it throws off some of the internal check timing calculations. + +Before we trip over the advanced buttons here, let's zoom in on the actual subscription: + +[![](images/subscriptions_edit_subscription.png)](images/subscriptions_edit_subscription.png) + +This is a big and powerful panel! I recommend you open the screenshot up in a new browser tab, or in the actual client, so you can refer to it. + +Despite all the controls, the basic idea is simple: Up top, I have selected the 'safebooru tag search' download source, and then I have added two artists--"hong_soon-jae" and "houtengeki". These two queries have their own panels for reviewing what URLs they have worked on and further customising their behaviour, but all they _really_ are is little bits of search text. When the subscription runs, it will put the given search text into the given download source just as if you were running the regular downloader. + +**For the most part, all you need to do to set up a good subscription is give it a name, select the download source, and use the 'paste queries' button to paste what you want to search. Subscriptions have great default options for almost all query types, so you don't have to go any deeper than that to get started.** + +!!! danger + **Do not change the max number of new files options until you know _exactly_ what they do and have a good reason to alter them!** + +## how do subscriptions work? { id="description" } + +Once you hit ok on the main subscription dialog, the subscription system should immediately come alive. If any queries are due for a 'check', they will perform their search and look for new files (i.e. URLs it has not seen before). Once that is finished, the file download queue will be worked through as normal. Typically, the sub will make a popup like this while it works: + +![](images/subscriptions_popup.png) + +The initial sync can sometimes take a few minutes, but after that, each query usually only needs thirty seconds' work every few days. If you leave your client on in the background, you'll rarely see them. If they ever get in your way, don't be afraid to click their little cancel button or call a global halt with _network->pause->subscriptions_--the next time they run, they will resume from where they were before. + +Similarly, the initial sync may produce a hundred files, but subsequent runs are likely to only produce one to ten. If a subscription comes across a lot of big files at once, it may not download them all in one go--but give it time, and it will catch back up before you know it. + +When it is done, it leaves a little popup button that will open a new page for you: + +![](images/subscriptions_thumbnails.png) + +This can often be a nice surprise! + +## what makes a good subscription? { id="good_subs" } + +The same rules as for downloaders apply: **start slow, be hesitant, and plan for the long-term.** Artist queries make great subscriptions as they update reliably but not too often and have very stable quality. Pick the artists you like most, see where their stuff is posted, and set up your subs like that. + +Series and character subscriptions are sometimes valuable, but they can be difficult to keep up with and have highly variable quality. It is not uncommon for users to only keep 15% of what a character sub produces. I do not recommend them for anything but your waifu. + +Attribute subscriptions like 'blue_eyes' or 'smile' make for terrible subs as the quality is all over the place and you will be inundated by too much content. The only exceptions are for specific, low-count searches that really matter to you, like 'contrapposto' or 'gothic trap thighhighs'. + +If you end up subscribing to eight hundred things and get ten thousand new files a week, you made a mistake. Subscriptions are for _keeping up_ with things you like. If you let them overwhelm you, you'll resent them. + +!!! warning + Subscriptions syncs are somewhat fragile. Do not try to play with the limits or checker options to download a whole 5,000 file query in one go--if you want everything for a query, run it in the manual downloader and get everything, then set up a normal sub for new stuff. There is no benefit to having a 'large' subscription, and it will trim itself down in time anyway. + +It is a good idea to run a 'full' download for a search before you set up a subscription. As well as making sure you have the exact right query text and that you have everything ever posted (beyond the 100 files deep a sub will typically look), it saves the bulk of the work (and waiting on bandwidth) for the manual downloader, where it belongs. When a new subscription picks up off a freshly completed download queue, its initial subscription sync only takes thirty seconds since its initial URLs are those that were already processed by the manual downloader. I recommend you stack artist searches up in the manual downloader using 'no limit' file limit, and when they are all finished, select them in the list and _right-click->copy queries_, which will put the search texts in your clipboard, newline-separated. This list can be pasted into the subscription dialog in one go with the 'paste queries' button again! + +!!! note + The entire subscription system assumes the source is a typical 'newest first' booru-style search. If you dick around with some order_by:rating/random metatag, it will not work reliably. + +## images/how often do subscriptions check? { id="checking" } + +Hydrus subscriptions use the same variable-rate checking system as its thread watchers, just on a larger timescale. If you subscribe to a busy feed, it might check for new files once a day, but if you enter an artist who rarely posts, it might only check once every month. You don't have to do anything. The fine details of this are governed by the 'checker options' button. **This is one of the things you should not mess with as you start out.** + +If a query goes too 'slow' (typically, this means no new files for 180 days), it will be marked DEAD in the same way a thread will, and it will not be checked again. You will get a little popup when this happens. This is all editable as you get a better feel for the system--if you wish, it is completely possible to set up a sub that never dies and only checks once a year. + +I do not recommend setting up a sub that needs to check more than once a day. Any search that is producing that many files is probably a bad fit for a subscription. **Subscriptions are for lightweight searches that are updated every now and then.** + +* * * + +_(you might like to come back to this point once you have tried subs for a week or so and want to refine your workflow)_ + +* * * + +## ok, I set up three hundred queries, and now these popup buttons are a hassle { id="presentation" } + +One the edit subscription panel, the 'presentation' options let you publish files to a page. The page will have the subscription's name, just like the button makes, but it cuts out the middle-man and 'locks it in' more than the button, which will be forgotten if you restart the client. **Also, if a page with that name already exists, the new files will be appended to it, just like a normal import page!** I strongly recommend moving to this once you have several subs going. Make a 'page of pages' called 'subs' and put all your subscription landing pages in there, and then you can check it whenever is convenient. + +If you discover your subscription workflow tends to be the same for each sub, you can also customise the publication 'label' used. If multiple subs all publish to the 'nsfw subs' label, they will all end up on the same 'nsfw subs' popup button or landing page. Sending multiple subscriptions' import streams into just one or two locations like this can be great. + +You can also hide the main working popup. I don't recommend this unless you are really having a problem with it, since it is useful to have that 'active' feedback if something goes wrong. + +Note that subscription file import options will, by default, only present 'new' files. Anything already in the db will still be recorded in the internal import cache and used to calculate next check times and so on, but it won't clutter your import stream. This is different to the default for all the other importers, but when you are ready to enter the ranks of the Patricians, you will know to edit your 'loud' default file import options under _options->importing_ to behave this way as well. Efficient workflows only care about new files. + +## how exactly does the sync work? { id="syncing_explanation" } + +Figuring out when a repeating search has 'caught up' can be a tricky problem to solve. It sounds simple, but unusual situations like 'a file got tagged late, so it inserted deeper than it ideally should in the gallery search' or 'the website changed its URL format completely, help' can cause problems. Subscriptions are automatic systems, so they tend to be a bit more careful and paranoid about problems, lest they burn 10GB on 10,000 unexpected diaperfur images. + +The initial sync is simple. It does a regular search, stopping if it reaches the 'initial file limit' or the last file in the gallery, whichever comes first. The default initial file sync is 100, which is a great number for almost all situations. + +Subsequent syncs are more complicated. It ideally 'stops' searching when it reaches files it saw in a previous sync, but if it comes across new files mixed in with the old, it will search a bit deeper. It is not foolproof, and if a file gets tagged very late and ends up a hundred deep in the search, it will probably be missed. There is no good and computationally cheap way at present to resolve this problem, but thankfully it is rare. + +Remember that an important 'staying sane' philosophy of downloading and subscriptions is to focus on dealing with the 99.5% you have before worrying about the 0.5% you do not. + +The amount of time between syncs is calculated by the checker options. Based on the timestamps attached to existing urls in the subscription cache (either added time, or the post time as parsed from the url), the sub estimates how long it will be before n new files appear, and then next check is scheduled for then. Unless you know what you are doing, checker options, like file limits, are best left alone. A subscription will naturally adapt its checking speed to the file 'velocity' of the source, and there is usually very little benefit to trying to force a sub to check at a radically different speed. + +!!! tip + If you want to force your subs to run at the same time, say every evening, it is easier to just use _network->pause->subscriptions_ as a manual master on/off control. The ones that are due will catch up together, the ones that aren't won't waste your time. + +Remember that subscriptions only keep up with new content. They cannot search backwards in time in order to 'fill out' a search, nor can they fill in gaps. **Do not change the file limits or check times to try to make this happen.** If you want to ensure complete sync with all existing content for a particular search, use the manual downloader. + +In practice, most subs only need to check the first page of a gallery since only the first two or three urls are new. + +## periodic file limit exceeded { id="periodic_file_limit" } + +If, during a regular sync, the sub keeps finding new URLs, never hitting a block of already-seen URLs, it will stop upon hitting its 'periodic file limit', which is also usually 100. When it happens, you will get a popup message notification. There are two typical reasons for this: + +* A user suddenly posted a large number of files to the site for that query. This sometimes happens with CG gallery spam. +* The website changed their URL format. + +The first case is a natural accident of statistics. The subscription now has a 'gap' in its sync. If you want to get what you missed, you can try to fill in the gap with a manual downloader page. Just download to 200 files or so, and the downloader will work quickly to one-time work through the URLs in the gap. + +The second case is a safety stopgap for hydrus. If a site decides to have `/post/123456` style URLs instead of `post.php?id=123456` style, hydrus will suddenly see those as entirely 'new' URLs. It could also be because of an updated downloader, which pulls URLs in API format or similar. This is again thankfully quite rare, but it triggers several problems--the associated downloader usually breaks, as it does not yet recognise those new URLs, and all your subs for that site will parse through and hit the periodic limit for every query. When this happens, you'll usually get several periodic limit popups at once, and you may need to update your downloader. If you know the person who wrote the original downloader, they'll likely want to know about the problem, or may already have a fix sorted. It is often a good idea to pause the affected subs until you have it figured out and working in a normal gallery downloader page. + +## I put character queries in my artist sub, and now things are all mixed up { id="merging_and_separating" } + +On the main subscription dialog, there are 'merge' and 'separate' buttons. These are powerful, but they will walk you through the process of pulling queries out of a sub and merging them back into a different one. Only subs that use the same download source can be merged. Give them a go, and if it all goes wrong, just hit the cancel button on the dialog. \ No newline at end of file diff --git a/docs/launch_arguments.md b/docs/launch_arguments.md new file mode 100644 index 00000000..e480aaf6 --- /dev/null +++ b/docs/launch_arguments.md @@ -0,0 +1,73 @@ +--- +title: launch arguments +--- + +# launch arguments + +You can launch the program with several different arguments to alter core behaviour. If you are not familiar with this, you are essentially putting additional text after the launch command that runs the program. You can run this straight from a terminal console (usually good to test with), or you can bundle it into an easy shortcut that you only have to double-click. An example of a launch command with arguments: + +``` +C:\Hydrus Network\client.exe -d="E:\hydrus db" --no_db_temp_files +``` + +You can also add --help to your program path, like this: + +- `client.py --help` +- `server.exe --help` +- `./server --help` + +Which gives you a full listing of all below arguments, however this will not work with the built client executables, which are bundled as a non-console programs and will not give you text results to any console they are launched from. As client.exe is the most commonly run version of the program, here is the list, with some more help about each command: + +##**`-d DB_DIR, --db_dir DB_DIR`** + +Lets you customise where hydrus should use for its base database directory. This is install_dir/db by default, but many advanced deployments will move this around, as described [here](database_migration.html). When an argument takes a complicated value like a path that could itself include whitespace, you should wrap it in quote marks, like this: + +``` +-d="E:\\my hydrus\\hydrus db" +``` + +##**`--temp_dir TEMP_DIR`** + +This tells all aspects of the client, including the SQLite database, to use a different path for temp operations. This would be by default your system temp path, such as: + +``` +C:\\Users\\You\\AppData\\Local\\Temp +``` + +But you can also check it in _help->about_. A handful of database operations (PTR tag processing, vacuums) require a lot of free space, so if your system drive is very full, or you have unusual ramdisk-based temp storage limits, you may want to relocate to another location or drive. + +##**`--db_journal_mode {WAL,TRUNCATE,PERSIST,MEMORY}`** + +Change the _journal_ mode of the SQLite database. The default is WAL, which works great for SSD drives, but if you have a very old or slow drive, a different mode _may_ work better. Full docs are [here](https://sqlite.org/pragma.html#pragma_journal_mode). + +Briefly: + +* WAL - Clever write flushing that takes advantage of new drive synchronisation tools to maintain integrity and reduce total writes. +* TRUNCATE - Compatibility mode. Use this if your drive cannot launch WAL. +* PERSIST - This is newly added to hydrus. The ideal is that if you have a high latency HDD drive and want sync with the PTR, this will work more efficiently than WAL journals, which will be regularly wiped and recreated and be fraggy. Unfortunately, with hydrus's multiple database file system, SQLite ultimately treats this as DELETE, which in our situation is basically the same as TRUNCATE, so does not increase performance. Hopefully this will change in future. +* MEMORY - Danger mode. Extremely fast, but you had better guarantee a lot of free ram. + +##**`--db_transaction_commit_period DB_TRANSACTION_COMMIT_PERIOD`** + +Change the regular duration at which any database changes are committed to disk. By default this is 30 (seconds) for the client, and 120 for the server. Minimum value is 10. Typically, if hydrus crashes, it may 'forget' what happened up to this duration on the next boot. Increasing the duration will result in fewer overall 'commit' writes during very heavy work that makes several changes to the same database pages (read up on [WAL](https://sqlite.org/wal.html) mode for more details here), but it will increase commit time and memory/storage needs. Note that changes can only be committed after a job is complete, so if a single job takes longer than this period, changes will not be saved until it is done. + +##**`--db_cache_size DB_CACHE_SIZE`** + +Change the size of the cache SQLite will use for each db file, in MB. By default this is 256, for 256MB, which for the four main client db files could mean an absolute 1GB peak use if you run a very heavy client and perform a long period of PTR sync. This does not matter so much (nor should it be fully used) if you have a smaller client. + +##**`--db_synchronous_override {0,1,2,3}`** + +Change the rules governing how SQLite writes committed changes to your disk. The hydrus default is 1 with WAL, 2 otherwise. + +A user has written a full guide on this value [here](Understanding_Database_Synchronization.md)! SQLite docs [here](https://sqlite.org/pragma.html#pragma_synchronous). + +##**`--no_db_temp_files`** + +When SQLite performs very large queries, it may spool temporary table results to disk. These go in your temp directory. If your temp dir is slow but you have a _ton_ of memory, set this to never spool to disk, as [here](https://sqlite.org/pragma.html#pragma_temp_store). + +##**`--boot_debug`** + +Prints additional debug information to the log during the bootup phase of the application. + + +The server supports the same arguments. It also takes a _positional_ argument of 'start' (start the server, the default), 'stop' (stop any existing server), or 'restart' (do a stop, then a start), which should go before any of the above arguments. \ No newline at end of file diff --git a/help/profile_example.txt b/docs/profile_example.txt similarity index 100% rename from help/profile_example.txt rename to docs/profile_example.txt diff --git a/docs/reducing_lag.md b/docs/reducing_lag.md new file mode 100644 index 00000000..26023b73 --- /dev/null +++ b/docs/reducing_lag.md @@ -0,0 +1,48 @@ +--- +title: reducing lag +--- + +## hydrus is cpu and hdd hungry { id="intro" } + +The hydrus client manages a lot of complicated data and gives you a lot of power over it. To add millions of files and tags to its database, and then to perform difficult searches over that information, it needs to use a lot of CPU time and hard drive time--sometimes in small laggy blips, and occasionally in big 100% CPU chunks. I don't put training wheels or limiters on the software either, so if you search for 300,000 files, the client will try to fetch that many. + +Furthermore, I am just one unprofessional guy dealing with a lot of legacy code from when I was even worse at programming. I am always working to reduce lag and other inconveniences, and improve UI feedback when many things are going on, but there is still a lot for me to do. + +In general, the client works best on snappy computers with low-latency hard drives where it does not have to constantly compete with other CPU- or HDD- heavy programs. Running hydrus on your games computer is no problem at all, but if you leave the client on all the time, then make sure under the options it is set not to do idle work while your CPU is busy, so your games can run freely. Similarly, if you run two clients on the same computer, you should have them set to work at different times, because if they both try to process 500,000 tags at once on the same hard drive, they will each slow to a crawl. + +If you run on an HDD, keeping it defragged is very important, and good practice for all your programs anyway. Make sure you know what this is and that you do it. + +## maintenance and processing { id="maintenance_and_processing" } + +I have attempted to offload most of the background maintenance of the client (which typically means repository processing and internal database defragging) to time when you are not using the client. This can either be 'idle time' or 'shutdown time'. The calculations for what these exactly mean are customisable in _file->options->maintenance and processing_. + +If you run a quick computer, you likely don't have to change any of these options. Repositories will synchronise and the database will stay fairly optimal without you even noticing the work that is going on. This is especially true if you leave your client on all the time. + +If you have an old, slower computer though, or if your hard drive is high latency, make sure these options are set for whatever is best for your situation. Turning off idle time completely is often helpful as some older computers are slow to even recognise--mid task--that you want to use the client again, or take too long to abandon a big task half way through. If you set your client to only do work on shutdown, then you can control exactly when that happens. + +## reducing search and general gui lag { id="reducing_lag" } + +Searching for tags via the autocomplete dropdown and searching for files in general can sometimes take a very long time. It depends on many things. In general, the more predicates (tags and system:something) you have active for a search, and the more specific they are, the faster it will be. + +You can also look at _file->options->speed and memory_. Increasing the autocomplete thresholds under _tags->manage tag display and search_ is also often helpful. You can even force autocompletes to only fetch results when you manually ask for them. + +Having lots of thumbnails open or downloads running can slow many things down. Check the 'pages' menu to see your current session weight. If it is about 50,000, or you have individual pages with more than 10,000 files or download URLs, try cutting down a bit. + +## finally - profiles { id="profiles" } + +Programming is all about re-editing your first, second, third drafts of an idea. You are always going back to old code and adding new features or making it work better. If something is running slow for you, I can almost always speed it up or at least improve the way it schedules that chunk of work. + +However figuring out exactly why something is running slow or holding up the UI is tricky and often gives an unexpected result. I can guess what might be running inefficiently from reports, but what I really need to be sure is a _profile_, which drills down into every function of a job, counting how many times they are called and timing how long they take. A profile for a single call looks like [this](profile_example.txt). + +So, please let me know: + +* The general steps to reproduce the problem (e.g. "Running system:numtags>4 is ridiculously slow on its own on 'all known tags'.") +* Your client's approximate overall size (e.g. "500k files, and it syncs to the PTR.") +* The type of hard drive you are running hydrus from. (e.g. "A 2TB 7200rpm drive that is 20% full. I regularly defrag it.") +* Any _profiles_ you have collected. + +You can generate a profile by hitting _help->debug->profile mode_, which tells the client to generate profile information for almost all of its behind the scenes jobs. This can be spammy, so don't leave it on for a very long time (you can turn it off by hitting the help menu entry again). + +Turn on profile mode, do the thing that runs slow for you (importing a file, fetching some tags, whatever), and then check your database folder (most likely _install_dir/db_) for a new 'client profile - DATE.log' file. This file will be filled with several sets of tables with timing information. Please send that whole file to me, or if it is too large, cut what seems important. It should not contain any personal information, but feel free to look through it. + +There are several ways to [contact me](contact.html). \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index 955f91fe..278132b5 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -4,18 +4,30 @@ repo_name: hydrusnetwork/hydrus use_directory_urls: false nav: - index.md - - Getting Started: - - introduction.md - - getting_started_installing.md - - getting_started_files.md - - getting_started_tags.md - - getting_started_downloading.md - - getting_started_ratings.md + - Help: + - Getting Started: + - introduction.md + - getting_started_installing.md + - getting_started_files.md + - getting_started_tags.md + - getting_started_downloading.md + - getting_started_ratings.md - PTR: - access_keys.md - PTR.md - - Advanced: - - docker.md + - Next Steps: + - getting_started_more_files.md + - adding_new_downloaders.md + - getting_started_subscriptions.md + - filtering duplicates: duplicates.md + - Advanced: + - advanced_siblings.md + - advanced_parents.md + - advanced.md + - reducing_lag.md + - database_migration.md + - launch_arguments.md + - docker.md - API: client_api.md - changelog.md