AI Waifus and You 101

akari-chan · Jun 22, 2023

PleaseCheckYourReceipts said:
So, this hit my feed. It's more AI training for voice conversion models. Guy apparently also has videos on setting this stuff up, as well.

Troons and catfishers are gonna have a field day with this.

Watamate · Jun 28, 2023

I've been messing around with roop and it's pretty fun even though there's definitely some jank. But for creating a deepfaked video with only a single picture I have to say it's pretty damn impressive.

Github:

GitHub - s0md3v/roop: one-click face swap

one-click face swap. Contribute to s0md3v/roop development by creating an account on GitHub.

github.com

Colab:

Google Colab

colab.research.google.com

Examples with a local celebrity:

Figure I'd also add some SD pictures I generated that I like. Mind you mildly NSFW and I was on a Cutesexyrobutts style binge.

00450-4041690478-(beautiful, aesthetic, masterpiece_1.2, best quality_1.2, top quality), perfe...jpg

00451-1788788595-(beautiful, aesthetic, masterpiece_1.2, best quality_1.2, top quality), perfe...jpg

00430-2771563599-cinematic lighting, highres, best quality, masterpiece, ultra detailed, intri...jpg

Clem the Gem · Jun 28, 2023

Tatsunoko said:
I've been messing around with roop and it's pretty fun even though there's definitely some jank. But for creating a deepfaked video with only a single picture I have to say it's pretty damn impressive.

Couldn't decide between "informative" or "horrifying"

God's Strongest Mozumite · Jun 28, 2023

Wow, that's some crazy stuff Tatsunoko, those sure are some "yabai" images, haha! Please do Sana next

Watamate · Jun 28, 2023

God's Strongest Sanalite said:
Wow, that's some crazy stuff Tatsunoko, those sure are some "yabai" images, haha! Please do Sana next

Wasn't sure if you also wanted her in the same style/clothes or not! I should mention though that the AI has trouble getting her hair ornaments correct; wrong amount, size, color, etc. Feel free to request stuff though.

God's Strongest Mozumite · Jun 28, 2023

Tatsunoko said:
Wasn't sure if you also wanted her in the same style/clothes or not! I should mention though that the AI has trouble getting her hair ornaments correct; wrong amount, size, color, etc. Feel free to request stuff though.

View attachment 33109 View attachment 33114 View attachment 33130
View attachment 33132 View attachment 33133 View attachment 33138
View attachment 33139

Holy shit I didn't expect you to actually deliver, thank you. I'm surprised how much detail the ai included, even trying to replicate the little critters on her baubles, bless its heart. Thank you for feeding a starving Sanalite, if I have any other requests I'll PM you :sanasmile:

PleaseCheckYourReceipts · Jun 28, 2023

God's Strongest Sanalite said:
Holy shit I didn't expect you to actually deliver, thank you. I'm surprised how much detail the ai included, even trying to replicate the little critters on her baubles, bless its heart. Thank you for feeding a starving Sanalite, if I have any other requests I'll PM you

There's a lot of AI art out there already. And you can generate a lot more, if you spend time training a model. Of course, what it means is nerds figured out how to make their own porn. You can see exactly where this is going, really fast.

Clem the Gem · Jul 1, 2023

Posting this now even though it's full of imperfections, since I've been working on this one image for days now and am getting sick of looking at it.
Composable LORA is now working with Latent Couple (technically, the Extensions list changed to a different, working branch of the extension) which means I could try making something I wanted to for a while.

Full version 5184 x 1728:

Working on high res images has been an exercise in frustration, causing the UI to grind to a halt, image generation either failing or taking a stupid long time, having to restart the UI and start all over, not to mention not being able to use the PC for anything else while it's working...

I ended up with more than 200 images that went into making this. Around 350 if you count the first version before I started over again.

I started by creating a scene in https://app.posemy.art/ to get a depth map and openpose to use in ControlNet to generate the first image

Since the models are all bald, I initially had quite a bit of trouble with getting the correct hairstyles in my images. I also completely forgot to adjust their height. I think the better way would have been to depth map the letters only, and rely on just the pose for the characters.

I used Latent Couple with Composable LORA enabled to mark out the areas and prompt one character in each area:

This is the first image that I settled on. I started at 1536 x 512 and is ugly as shit, but that's to be expected. I don't know what's going on with Polka's E, but it gave me that more than once. I did think about keeping it:

Hires Fix at 1.5 scale (2304 x 768) improved it a bit:

Sent to img2img and upscaled to 3456 x 1152:

Now this is where it feels like I was cheating. I spent a long time trying to upscale this image while maintaing the characterss' poses and clothing, but I was either running into memory errors, having the process stall, or just getting bad results. So what I did is use inpainting to just re-draw every character one at a time, many times until I got a good result. ControlNet took care of keeping them in the same position.

Inpainting was set to inpaint only the masked character, "Masked content: Original", "Inpaint area: Only masked", and the width x height set to 768 x 768. This means that it is still looking at the surrounding picture to redraw in a way that makes sense, but it's only working on a small area which is much more manageable and doesn't just error out.

Inpainting introduced lots of little imperfections around the letters and the horizon especially, and the faces were still not very detailed, so a lot of time was spent fixing them up, one at a time.

I had to compromise here and just accept the results it gave me, since I could not get the exact clothing I wanted some of the time. Same goes for getting their tails. But overall, I'm pretty pleased with it. Hag Love.

PleaseCheckYourReceipts · Jul 1, 2023

@Clem the Gem the important thing is you made sure to count fingers properly. Which is normally the downfall of AI models. A good use of LORA as well.

Clem the Gem · Jul 1, 2023

PleaseCheckYourReceipts said:
@Clem the Gem the important thing is you made sure to count fingers properly. Which is normally the downfall of AI models. A good use of LORA as well.

You know what? I got lucky. I only now realise I never even checked the fingers or made changes to them. You can see some of the hands are a bit blurry, but no obvious extra fingers!

Clem the Gem · Jul 9, 2023

Learn how to get this high-spec artist to draw for you!

PART 1

INTRO

A lot has changed in the AI art world since this thread was first made 9 months ago, so I though I'd write up an updated guide.
I won't go into how awesome AI art is, since if you're here you probably already have a good idea. I'll just say that this technology just continues to impress, and in just the past few months we've seen some amazing new features and improvements that make it even more accessible on even lower end hardware.

Some things that come to mind that you can do currently:

Generate images of anything you can think of, in any style you can think of using text prompts, just like that one I made above.

Take a small image (existing, or generated by yourself) and enlarge it beyond what you thought possible. Let the AI add detail and fill in the gaps.

Add or remove objects from an image.

Create new images from a reference image in a different style or add new details.

Use tools to add more precision to your prompts: Let the AI draw a character in the exact pose you want from a reference image.

Turn a scene into a depth map and use that to reimagine the scene as something else.

Take a child's scribble drawing and have the AI interpret it into a realistic image.

Train your own LORAs - add your favourite characters or even your own face to whatever kind of scene you can think of.

Create animation and video.

There are websites where you can use servers and other people's hardware set up to run Stable Diffusion for you, but these will be quite limited in features, and may require a paid subscription to use. The best way to experience Stable Diffusion is to run a local version yourself on your own hardware for free. You need only a decent PC and a couple of essential programs installed. We'll be running the popular AUTOMATIC1111 WebUI which has become the standard for running Stable Diffusion.

In this guide, I'll begin by telling you all the things you need to download and install to get the best experience. I'll explain what they are briefly, but will go into more detail later in the guide where we go through using each feature and make some cool pictures.

I am only going to go over the features I have used personally. There are some inbuilt features I am not familiar with, and even more available as extensions. I have linked some very good YouTube channels at the end where you can find in-depth videos on all kinds of features.

All images in this guide were knocked out quickly while writing this guide, with the most basic of prompts, to show you how easy it is to get good looking pictures very quickly. With a little time you can get even more amazing results.

TERMINOLOGY

Stable Diffusion: Stable Diffusion is a latent diffusion model. Other models you may have heard of are MidJourney and ChatGPT.
In simple terms, It's an AI; a neural network. By itself it's just a bunch of code, so to use it we need some other resources and a way of running it on our PC
(inb4 :Fauna-AKSHUALLY:

.. this explanation is good enough OK).

Automatic1111 / WebUI: The de-facto UI (user interface) for Stable Diffusion. This is what lets you interact with SD from your web browser.

Prompt: At its core, Stable Diffusion takes text "prompts" that you type in, and turns them into images. you describe the scene, and the AI does the rest. There is an art to tricking the AI into giving you the best images.

Seed: A 10-digit number that is used to ensure no two images are the same, even if all other settings are identical. A seed of -1 means a random number will be used each time.

Models: Confusingly, in addition to the Stable Diffusion model itself, you will be downloading and using various "models" or "checkpoint" files. These have been trained on thousands of images in a certain style, be it anime, photorealistic, artistic, or just general all-round. There are thousandsto choose from. At least one checkpoint file is required to run Stable Diffusion. These files are in the .ckpt or .safetensors format and are generally around 2-4GB in size.

LORA files: These have been trained on pictures of one particular character, an art style or concept/action. If you want to make pictures of a popular character, chances are someone has already made a LORA for it. Typically, 144MB or less.

HARDWARE REQUIREMENTS

OS: Windows 10/11 *
GPU: Nvidia graphics card with at least 8GB of VRAM (RTX 2060-2080, 3060-3090, 4080-4090) *
RAM: 16GB System RAM
HDD: ~10GB Storage space for initial install, additional models 2-4GB each
Time investment: Minutes for initial install, depending on download speeds. Then be prepared to spend every waking hour generating images. This stuff is addictive. A gambling game where you roll the dice to get the perfect image, and every roll is something new.

* Cards with less VRAM can be used with some tweaks. The requirements are higher if you plan on training your own models (not covered in this guide).
There is support for (some?) AMD cards, as well as other operating systems, again with tweaking required. You'll need to check that yourself as I'm unfamiliar with it. It would also be a good idea to install the latest video drivers.

INSTALLATION

(These instructions are for Windows only. Linux users are probably already used to dicking around with github and python anyway. Just install the repo.
If you're on a Mac, I dunno. Does any of this even work?)

New super-easy way

A1111 WebUI Easy Installer and Launcher
This all-in-one installer will download and install everything you need including Python and Git and let you set launch options from a handy launcher. If you don't know what Python and Git are, don't worry, it's just boring nerd stuff needed to run Stable Diffusion

Download from here. Follow the link in the Installing section.

Run the installer and install to a sensible folder somewhere. You will be asked if you want to download the base SD model. You can save some GBs by saying no, since we will be downloading better models anyway.

Let it do its thing and launch once it's finished downloading. It will be about 10GB. You may want to skip ahead to the Preparation / Models section below to download models while you wait.

There are some launch options to take a look at. You may wish to untick "Auto-update WebUI" and just download updates yourself, as the latest updates have been known to break things before they get fixed.

The first launch will likely take some time as it downloads more files. After launching, you should automatically be taken to the WebUI in your browser.

The original way

Download and install Python 3.10.6 - do not install any other version, even if newer. It won't work. During installation, ensure you click the box "Add Python to PATH".

Download and install Git Everything can be left as default here.

Once both are installed, create a new Stable Diffusion folder somewhere sensible. From that folder, enter "cmd" into the address bar to open a command prompt window. In there, enter the command "git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git". You will find lots of files and folders have been downloaded into that folder. It should only take a moment as they are small in size.

There are a couple of changes we should make before proceeding: Edit (not run) webui-user.bat (in Notepad or similar) and look for the line "set COMMANDLINE_ARGS=". Modify it so it reads "set COMMANDLINE_ARGS= --xformers". You may wish to also add " --autolaunch" to the same line to automatically open a browser tab to the UI when you run it.

Run webui-user.bat. This is what you will use from now on to run the UI, so feel free to make a shortcut to it. You will see files being downloaded. This will take some time to download around 10GB of files, and it may look like nothing is happening at some point. Just let it run. You may now want to continue onto the Preparation section and download more things that we'll need while Stable Diffusion does its thing. Eventually it should stop, with the message "running on local URL: http://127.0.0.1:7860" (or similar) and that the model has been loaded. If you did not opt to use autolaunch, type that URL into your browser to open the UI.

Running on lower spec cards

There are a couple of options you can check to run on cards with less than 8GB of VRAM. If you're not sure how much VRAM you have, run "dxdiag" from the start menu. You will see your card on the Display tab, and Display Memory (VRAM) just below. This will have a performance impact, so only use if needed.

If you have 4GB-6GB VRAM use --medvram.
If you have 2GB VRAM, use --lowvram

To set this option, edit webui-user.bat with Notepad and edit the line "set COMMANDLINE_ARGS=" to include one of the above options, adding after any other options that might be there. For example:
set COMMANDLINE_ARGS= --xformers --medvram

At this stage, you are ready to start generating, but there are some more things we should do first to get the best experience, as out of the box, you might get some pretty disappointing results:

On the left, the base SD 1.5 model. On the right, the same exact prompt, with a community-made model.

Prompt: "girl standing in city street, rain, city lights, neon lights, looking at viewer"

PREPARATION

Models

First things first, we're going to need to download a good model, since the basic SD1.5 just isn't going to cut it these days.

https://civitai.com/ will be your one-stop shop for models, LORAs, and older lesser used formats like Textual Inversions and Hypernetworks.

You will need to be signed in with a free account to see the spicy ones.

There are thousands of them, and it's easy to get caught up downloading and trying every one that looks good. For now, just try a couple of the highly rated ones - maybe one realistic, and one anime style.
If you want to follow along with the guide and generate the exact same images as I have, you'll need to download the same model. I'll be using the "Koji" model, as of today version 2.1, hash CADAE35D32.

A selection of different models using the same text prompt

I find it useful to also download one of the thumbnail images at the same time and save it with the same filename as the model, but with a .png or .jpg (but not .jpeg) extension. It will then appear in your Additional Networks list with a nice thumbnail. I also like to give the model name a prefix like [A] for anime, [R] for realistic, which helps to sort and find them easier in the dropdown list.

The model will be either in either .safetensors or .ckpt format. You should use the .safetensors if you have the choice.
Save the model and thumbnail to your \stable-diffusion-webui\models\Stable-diffusion folder.

Viewing models in the UI using the extra networks button

When looking through the available models, you might notice they list a Base Model, usually SD 1.5. This is the version of Stable Diffusion they were built for. While SD 2.0 and 2.1 are newer (and are used a bit differently), they have been generally poorly received, and 1.5 remains the most popular version, so we'll be sticking with that for now.

You should read through the description of the models you're interested in, as they often give tips on what sort of prompts to use to get the most out of them. You can also open the example images to see exactly what prompts and settings were used to make them.

Textual Inversions

While you are still on Civitai, you should grab some textual inversions.
Textual inversions were a thing before LORAs made them redundant except in one way - they are now used in your negative prompt to tell the AI what kind of images you don't want. They are used by typing the textual inversion's name into the negative prompt.

Just search for "negative" and you will see them. Popular ones that I use are "Easy Negative" and "Bad Artist Negative". These .pt files are only a few Kb. Save them to your \stable-diffusion-webui\embeddings folder. To use them, you just have to enter their name or trigger word in your prompt.

4x-Ultrasharp Upscaler

This is the upscaler you'll generally want to use all the time when upscaling to add more fine detail to your image.
Download the file 4x-Ultrasharp.pth from here and save it to your \stable-diffusion-webui\models\ESRGAN folder.
You can set it as the default upscaler in Settings - Upscaling - Upscaler for img2img

VAE

Supposedly VAE files improved small details like eyes on faces and text. The main reason you will want to get one is because if certain models are used without a VAE, your image will turn out desaturated and with small artifacts. vae-ft-ema-560000 is a good all-round one suitable for all models.

Download the file vae-ft-ema-560000-ema-pruned.safetensors from here and save to your \stable-diffusion-webui\models\VAE folder.

Some models will come with a VAE "baked in", others do not and it's not always clear whether they do or not when the author doesn't say. You're not going to have any trouble using a VAE on top of a model that already has one included, so best to just grab one now.

In order to change the VAE in the UI, you need to go into Settings - Stable Diffusion - SD VAE. Change it and hit Apply.
Below in the Settings section of this guide, I will explain a way to change this from the main window without needing to go into the settings.

Clip Skip

While changing Clip Skip won't affect the visual quality of your images, it can help in getting Stable Diffusion to follow your prompt more accurately.
Here's a good GitHub post explaining exactly what Clip Skip is and how it works.

For now, you just need to know it's one more setting that can completely change how your image looks, and is important if you want to be able to reproduce the images in this guide. Most people have this set to 2, and we will too.

You will find it in the Settings tab, under Stable Diffusion \ Clip Skip.

Extensions

Extensions add all sorts of features, from QOL improvements to the UI, to absolute game changers in image generation. From the Extensions tab, click Available and then Load from. This will load a list of extensions that you can install one by one. Once you've finished, go back to the Installed tab and click Apply and Restart UI. You should also restart Stable Diffusion as they do not always load correctly. Just close the command prompt window and run the batch shortcut again.

Essential

sd-webui-controlnet - Very powerful suite of tools that essentially take an input image, decipher something from it, and feed it into the image generation to guide the AI with your prompt. It can be a character's pose, a depth map, an outline or just a reference image to name a few. There is a whole section on ControlNet later in this guide. You can also find more info here.
You will need to download a number of models (no, not those models, or those other models) to go with it. There are currently 14 of them, each 1.45GB, so you may not want to download them all right away. I would start with Depth, Canny, OpenPose, Scribble and Inpaint. You can find them here.
It is the .pth files you want. Put them in your \stable-diffusion-webui\extensions\sd-webui-controlnet\models folder. You should already have the .yaml files in the same folder from installing ControlNet.
We won't be needing these models until near the end of the guide, so you can download later if you want to jump right in.

ultimate-upscale-for-automatic1111
Allows you to easily upscale and add detail beyond what your hardware might usually be able t handle, by splitting the image up into smaller chunks and processing them one at a time

After Installing these, you will also need to go back to the Extensions tab, and Check for Updates, and install any required updates. Apply and restart again.

Useful / QOL

a1111-sd-webui-tagcomplete
Most models you download will be designed to use Booru tags for prompting. This extension looks up and auto-predicts tags as you type. Press the tab key to auto-complete. Not only that, but it also helps auto-complete LORAs and Textual Inversions as soon as you start typing anything beginning with <

stable-diffusion-webui-images-browser - Image browser where you can filter and search your generated images, view the settings used to create them, and copy those settings into a new prompt.

To look at later (not covered in this guide)

stable-diffusion-webui-two-shot - Latent Couple allows you to specify regions in the image, and describe exactly what should appear where; something that is all but impossible through prompting alone.

stable-diffusion-webui-composable-lora - Used in conjunction with Latent Couple above, lets you apply LORAs to specific parts of the image.

NAVIGATING THE UI

Here I'll quickly go over the most important things you'll be using on the UI.

(You may notice some extra tabs and buttons you don't have - these are optional extensions I have installed)

1) Model (checkpoint) selection: If you have downloaded multiple models, this is where you will select them.
2) Positive prompt: Describe what you want to see.
3) Positive prompt: Describe what you don't want to see.
4) Generate: Press the big button when you are ready to go. It changes to an Interrupt button when an image is being processed.
5) Sampling method, Sampling steps: Basically, the various samplers use different methods to generate your image. Steps is how long the image will be worked on. Higher steps can give more detail, but will take more time to process. It's a bit of an advanced topic, so here's a picture showing the effect of different samplers and steps:

The two most popular sampling methods at the moment are Euler a, and DPM++ 2M Karras. I'll be using the second one.

6) Width & Height: Size of your image. You should start with 512 x 512 or 512 x 768. Your images can be enlarged in later steps. Remember that the canvas shape you choose (portrait or landscape) will influence the composition of the image that gets generated.
7) Batch count & Batch size: Used together with a random seed to generate a large batch of images at once. you might wonder what the difference is between 1 batch of 4 images, or 4 batches of 1 image. Basically, a high batch size will produce multiple images at the same time, but use more VRAM and you will be unable to see the results until the operation has completed.
A high batch count however will produce each image one at a time, and will take longer. More info here.
8) CFG Scale: CFG scale tells Stable Diffusion how closely it should follow your prompt. If you feel your prompt is not being followed, you might try increasing it. Generally, it should be left around 7-9.
9) Seed: A 10-digit number that is used to ensure no two images are the same, even all other settings are identical. If you want to reproduce an image, it will need to have the same seed that was used to create it.
Clicking the dice icon or entering -1 will give a random seed. Click the recycle icon to recall the seed that was used for the last generated image. If you are testing the effect of different settings, it is important to keep the seed the same, otherwise a random seed will introduce random changes.
10) Highres. fix: A quick way to get an improved, upscaled image. Explained in more detail below.
11) Your generated image(s): There's nothing here yet! Note the Save button just below. You should get into the habit of saving images you like and deleting everything else, lest you end up like me with a folder full of *checks* ..tens of thousands of images that you don't know what to do with.

TYPICAL WORKFLOW

This is the typical process for creating good looking, high-res, detailed images.
Each step will be explained in more depth later on.

1. Basic composition - txt2img tab
Use txt2img to generate one or many images at a low resolution (typically 512 x 512 for a square image). Optionally, use ControlNet to guide the prompt. Roll the dice again and again until you get something that looks nice. It's OK if the face or small details look ugly - we will fix that soon. You can generate in batches to get a nice grid of images and pick the one you like most. The idea here is to get an image quickly. Generating at this low resolution will take only a few seconds. Starting at a higher resolution will not only take longer, but you will not get the same amount of detail as if you were to start low and use tools to improve it later.

2. Hires Fix - txt2img tab (optional)
A quick and easy way to upscale your image and get some extra detail

3. Upscaling, Refining - img2img tab,
Send the Hires Fix'd image to img2img. Here we have tools to upscale the image to very large sizes and with better detail much faster than if we were to try it from the very start with txt2img.
Here you may also try inpainting to fix any problem areas or add even more detail.

LET'S START USING THIS THING

Alright, it's time to make our very first image.
I'm going to start with a very basic prompt, leaving it deliberately vague for now. Enter the following in the positive and negative prompt boxes:

[Positive prompt]
(masterpiece:1.2, best quality:1.2, highly detailed, cinematic lighting, sharp focus, perfect face, absurdres), *
1girl, sitting, river, trees, flowers

[Negative prompt]
(worst quality, low quality:1.4), easynegative * **

Model is Koji 2.1
Sampling method is DPM++ 2M Karras
Size is 512 x 512
Sampling steps 30, and CFG scale 7.
* I will be using these same lines in all examples, so from this point on I'll only be mentioning the main prompt.
** easynegative is the name of a Textual Inversion file you should have downloaded. See Preparation section above

I used a seed of -1 to get a random result. If you would like to generate the exact same image and follow along, then you can enter the seed 4036314891.

At this point it's worth noting that there's a very possible chance you could end up with NSFW images. Most models are perfectly capable of rendering your most degenerate fantasies in great detail. Some are made especially for it. So if you want to keep it safe, you may want to add words like "nsfw", "nude" or any other body parts that might pop up into the negative prompt.

Now hit the Generate button and let's see what we get!

Well, we've got our girl and she's sitting. There are trees, a river and flowers. A pretty good result.
But maybe we were expecting to get a girl sitting near a river and not half in it. What about her clothes? The way she's facing? If we see her whole body or a close-up? If it's not specified, then we will get something entirely random.

Let's keep all the settings the same, but alter the prompt a bit, keeping the first line but changing the second:

1girl, sitting, on ground, near river, trees, flowers, kimono, smile, looking at viewer, full body

Again, it matches the prompt technically. She is sitting on the ground in all of the pictures, but still with her legs in the water. One thing we could try is specifying the way she is sitting:

1girl, sitting, on ground, near river, trees, flowers, kimono, smile, looking at viewer, full body, seiza

That's more what I was going for.
You can find more useful tags related to posture here. You will probably find yourself coming back to the full tag list often, so stick it in your bookmarks.

Tag Group:Posture Wiki | Danbooru

[See tag groups.]

safebooru.donmai.us

Now, you may be keen to start making pictures of your favourite characters. You might have already tried and got some mixed results. We will go into getting recognisable characters a bit later.

Have some fun making some cool pictures. Make use of the batch tool to make a bunch at once and pick the ones you like. In the next section we'll take one and make it look even better. Unless you're making only close-up portraits, you will most likely get some messed up faces. Don't worry, at this stage we are working on small, low-quality images at first before we improve them.

MORE ON PROMPTING

Making good prompts is a whole topic in itself and there are many articles and videos out there for you to look at. I tend to reuse one particular set of words and phrases in my prompts (both positive and negative) every time which is supposed to give an all-round good image and filter out undesirable elements. I then append the actual prompt I am going for onto this. You can set these general prompts to appear by default on launch (see Settings section below).

If you are familiar with Booru style image galleries, you will recognise image tags. Most of the models you will be using, especially those focused on anime will know how to interpret these tags. What this means is instead of writing a lengthy phrase to describe your image, you can use tags separated by commas. Note that spaces are used instead of underscores. Check this Danbooru tag guide for a comprehensive list of tags.

It's all about giving a detailed description and narrowing it down to get exactly the scene you want. If there is something you have not specified, Stable Diffusion will fill in the gaps itself, and it may not be what you expected at all. On that note, it can be fun to use a deliberately vague prompt like "1girl, city" and see what the AI comes up with. You can spend a long time rolling the dice and seeing the endless possibilities.

You should think about the composition of the scene - who is in it and what are they doing? What kind of pose are they in? Describe the location, the weather, time of day, time of year. Lighting is very important too. You should decide where the character is looking, and where the "camera" is focused. Is it a close-up portrait or a wide angle shot? You can also specify the medium (digital painting, oil painting, etc) and art style.
You can get away with being less descriptive if you are making anime style pictures - especially when using a model tuned for them. If you're going to photorealism however, you'll need to put more work into your prompt.

There are a couple of special characters used for syntax:

(Round brackets) are used to give emphasis to words. a single pair of brackets will increase its strength by 1.1. You can use multiple sets of brackets to add more emphasis.
You can achieve the same effect by specifying the weight like so: (word:1.2). This will give it a strength of 1.2.

[Square brackets] are used in the same way, except to decrease the strength by 0.9.

\ Backslashes are used when you want to use round brackets as part of a tag. For example, you might want to use the tag "CharacterName (AnimeName)". You would add the slash to both brackets, like so: "CharacterName \(AnimeName\)"

You can save your prompts as "styles", so you can easily load them up when needed. This is especially useful if you want to reuse an all-purpose prompt to get good details, like mentioned above.

Here you can find a guide for installing a nice .csv file containing many premade styles to quickly get good results (a bit of an emphasis on realistic style images)

HIRES FIX

Hires fix can be used to upscale and improve your initial image and, depending on your needs, produce a final image, or a basis for further upscaling in img2img.
I usually upscale my 512 x 512 starting image by 1.5x or to 2.0 as anything more than that starts taking too long on my hardware. With larger images, you might start running into memory errors and be unable to get an image.

First, you'll need an image. As Hires fix is a feature of txt2img, there is no way to load up an existing image (that's what img2img is for) - you can only turn it on and it will be used on the next generated image. If you have just created an image and it's still on the screen, you just need to click the recycle seed button and enable Hires fix before generating it again.

If you want to use an image you made previously, you can go to the PNG Info tab. Here you can drag your image in and then click "Send to txt2img".

Of course, you may also just want to make a brand-new image.
Whichever way you choose, there are some important options we need to change before hitting Generate:

Upscaler: If you followed the Preparation section, you should have the 4x-UltrSharp upscaler available to select.
Denoising strength: From 0 to 1, the higher it is, the more detail you will get, but the more the image will change. A value around 0.4 - 0.5 usually works well.
Upscale by: Scale your image up this many times. Stick to a maximum of 2. You can see the scaled-up size next to the Hires fix tick box.

Remember to reset your batch count/size back to 1 if you had previously set it higher

Let's see the results:

On the left is an image I have chosen to fix. I like the composition of this picture, but the face is very distorted. In the middle is the fixed version at 0.4 denoising, and upscaled 1.5x. On the right it has been upscaled 2x, also at 0.4 denoising strength.

You can clearly see the image has been enlarged, while adding definition, but keeping the original look.

For your information, here is a comparison of some of the different upscalers and denoising strengths from 0 to 1.

You should see that we get the best results around 0.4 to 0.5. After that, the picture starts changing. We lose the buildings in the background, then the bushes. The clothes around the legs also change, until we end up with a completely different picture entirely. The upscaler does not seem to make much of a different to the end result in this case, but you will see the Latent upscaler requires a much higher denoising strength before it starts working.

We will look at some more ways to upscale even further later on, but for now let's try to reproduce some characters.

DRAWING EXISTING CHARACTERS, USING LORAs

Let's first try recreating a known Hololive character using a regular prompt alone.
With the help of a Booru gallery, we'll use as many tags as we can to describe a well-known character:

1girl, tokoyami towa, standing, outdoors, cowboy shot,
purple hair, pink hair, multicolored hair, twintails, green eyes, white jacket, cropped jacket, off shoulder, black tanktop, black shorts, short shorts, black choker, belt, o-ring, piercing, midriff, black headwear, baseball cap, bibi \(tokoyami towa\), tokoyami towa \(1st costume\), fake horns

While the face and hair look pretty spot-on, the clothes are not really right at all, even though they were described pretty well. Once again, they technically follow the prompt. The jacket is white, the shorts are black and so on, but it's not able to replicate the exact style. This is where LORAs come in.

Head on over to Civitai and search for tokoyami towa and look for a LORA file.

Nice, this one even contains multiple official costumes in one file. Take a look at the notes. The description says it is best used with a strength of 0.6 to 0.7. You can see there are trigger words. These are used to get the different costumes. The default costume is just the name, "tokoyami towa".

Download it and save it to your \stable-diffusion-webui\models\Lora folder as hololive-towa.safetensors.
Like with the checkpoint models, I find it useful to download one of the images and save it in the same place with the same name, but .jpg.

Now let's try the same prompt again, but this time with <lora:Hololive-Towa:0.7> * added in.
You can also select it from the red Networks button under the Generate button (you will need to refresh the list).
* When you insert the LORA, it will be with a strength of 1 by default. Modify the prompt to read 0.7 as recommended by the LORA author, or whatever strength you want. You will need a lower strength if you want to make the subject look different, such as changing their clothes.

With the LORA in place, you're now free to remove much of the description, and you'll still get something that looks accurate:

1girl, tokoyami towa, standing, outdoors, cowboy shot,
<lora:Hololive-Towa:0.7>, twintails, cropped jacket

If part of the clothing isn't quite right, you can just add more description as needed to fix it.

Now that we have the LORA taking care of the character, we can give them any clothes we want:

You will often find the clothes get a colour scheme that fits with the character, even without specifying.

PNG INFO

Something I should mention is the PNG Info tab.
From here, you can drop in an image that you or someone else has created, and retrieve all the info that was used to generate it. You then have buttons to send the details directly to txt2img or img2img for you to use. It will overwrite your current prompt with the one that was used to make the image.
Try it with some of the examples in this guide.

Note that you will need to check which model was used and change to that model yourself. Of course, you will need to have the model in the first place if you want to recreate the same picture. Also be aware that if the image was manipulated in any way in an image editor it will lose its generation data. If the image has gone through multiple steps such as upscaling and inpainting, you will only be able to see the information that was used in the last step, and not the base image. Obviously, .jpg files or anything other than .png will not work either. I resized some of the larger images in this guide, but should have provided the prompt to go with them.

ALTERING IMAGES WITH IMG2IMG

Now that we've covered the basics of txt2img, it's time to move on to the img2img tab.
I have created a new image in txt2img and pressed the Send to img2img button, which copies over both the image and prompt. You may also import any existing image saved on your PC.

Prompt: girl sitting at table in kitchen, long black hair, ponytail, brown eyes, looking at viewer, smile, white hoodie, upper body

Upscaling & Redrawing

The first thing we can do with img2img is pretty much just the same thing we did before with Hires fix: Make the image bigger while adding detail.

Looking at the UI, we have much of the same things from txt2img - We can change the prompt, use a different model and sampling method, change the steps and CFG scale. What we're interested in is the lower section:

We'll be upscaling this image 2x from 512 x 512, so we can either enter the new width & height of 1024 x 1024 on the "Resize to" tab, or enter a scale of 2 on the "Resize by" tab.
It's worth noting it can take a very long time to upscale large images, and at some point, you will hit a limit where your GPU can't handle it. If you have trouble upscaling to 2x, you can try 1.5x (768 x 768). Later on, we will look at some easy ways to upscale huge images which are light on your GPU.

Just like with Hires fix, we have a denoising strength. As expected, using a value around 0.4 will introduce detail without changing the image much. A higher strength will change it more.

Upscaled 1.5x, with denoise strength 0.40

Upscaled 1.5x, with denoise strength 0.75

Once we have a result we like, we can click Send to img2img so any changes we make from now will be on the improved version. Img2img is all about making small changes on one image, copying that image over and then making another change to that one to improve it even more.
By the way, we did not need to even enter a positive or negative prompt at all when we upscaled the original 512 x 512 image. The result would be a near identical image, since Stable Diffusion is looking at the image itself and not being influence by a prompt at all in this case.

There is more we can do than just make things bigger. How about changing the prompt?
I'm going to add the words "open mouth" to the prompt and keep the denoising strength at 0.4. Also, double check the dimensions and that you are not upscaling again. We'll stick to working at 1024 x 1024.

A good result - we got the change we wanted, without the rest of the image being affected too much.
If we wanted to make sure nothing changed whatsoever except for the mouth, we might look at Inpainting instead - covered later.

What about if we try changing "white hoodie" to "red hoodie"?

Unfortunately, at low denoising we only get red highlights, and we have to go all the way up to 0.75 before we see a red hoodie, by which point the image composition has changed too much. Of course, this might not necessarily be a bad thing if you just want a nice image and don't care if it matches the original.

Sketch

As we all know, glasses are very versatile, so let's try adding some. Now we could do this with a prompt alone, and Stable Diffusion should not have much trouble figuring it out, even at a lower denoising strength. But unless you're specific in your prompt, you're going to get a random style, shape, colour etc, and you might spend a long time rolling the dice until you get what you want. So let's try drawing some ourself.

Click the Sketch button, next to Copy image to:, and our image will be copied over to the Sketch tab.
Use the brush tool to draw in the glasses.

At a denoising strength of 0.5, Stable Diffusion has figured out my scribble is supposed to be a pair of red glasses, even without saying so in the text prompt:

With a low strength, it will more closely match the original misshapen scribble. Going higher tends to disregard the sketch entirely. To fix this, we will have to use a bit of a text prompt to help guide the AI:

With a denoise strength of 0.6 and the text "red-framed eyewear" added to the prompt, we still retain the basic sketch, but the glasses better fit with the rest of the image. However, the composition of the image has now changed quite a bit.

We could even use the same method as a crude way to get that red hoodie we wanted before:

Denoise strength: 0.6 with "red hoodie" added to the prompt. I probably should have chose a darker red...

Inpaint Sketch

Now we're moving on to the Inpaint sketch tab. I'm skipping over the Inpaint tab for a moment - this will be covered in the next section.
There are some more options to take a look at:

* Mask mode: This will be left on "Inpaint masked"
* Masked content: most of the time, you will be using "original" or "latent noise".
Original will try to look at what is in the painted area before redrawing it to match the prompt. This is especially used when improving or changing faces or other body parts.
Latent noise uses random noise guided by the prompt to add something new in the painted area, rather than adjusting what's already there. It will require a higher than usual Denoising strength in order to work.
* Inpaint area: Leaving this at "Whole picture" means the whole picture is redrawn from scratch to include the prompt in the masked area. "Only masked" redraws just the masked area.
* Width & Height: When used with the "only masked" option above, this will determine the resolution of the masked area rather than the whole image. This means if you start with a 512 x 512 image and mask out a small area with the width & height also set to 512 x 512, then that small area is going to be drawn with the same amount of detail that would have gone into the entire image, giving you finer detail.
* Denoising strength: As usual, the higher it is, the more freedom you give the AI to draw something.

I'm going to add something to the table. Can you guess what it is?

I'm using the prompt "banana" with a denoise strength of 0.6. Masked content is set to Original and Inpaint area is Only masked. Width & height is 512 x 512.

The banana looks a bit out of place, probably because of the lack of shadow. What I should have done is drawn the sketch with a shadow to begin with. But I think we can salvage this.
Keeping everything the same, a quick line drawn on the banana seems to work pretty well.

If we wanted to, we could send this over to the Inpaint tab to improve even more, but I think this is good enough for now.

How about adding something in the background?
Again, everything is the same here, just with a new sketch and the prompt "potted plant".

The AI does a really good job of putting the plant behind the head, and even blurs it as part of the background.

UPSCALING & ADDING DETAIL

Ultimate SD Upscale

I'm loading up a new image I generated earlier. Starting at 768 x 512, Hires fix was used to upscale 2x to 1536 x 1024. You can either download this image and load it into img2img, or you could try making your own image. The important thing is to have something that already has a good amount of detail at a smaller size.

Upscaled 2x in txt2img with Hires fix (1536 x 1024)

This part will require the SD Ultimate Upscale script, as well as the 4x-UltraSharp upscaler, so make sure you got both, explained back in the Preparation section.

First make sure you are back on the main Img2img tab. From here the only option you need to worry about is the denoising strength. It is even more important to keep it low here. Anything above 0.4 or so and you will end up with amusing but completely unwanted results, with faces and body parts scattered all over your image. You can either leave the positive prompt blank, or try reusing the same one made to make the original image.

You can find the SD Ultimate Upscale script right at the bottom of the UI under the Scripts dropdown. Select it and make the following changes:

Target size type: Scale from image size, and set the Scale to 2.
Upscaler: 4x-UltraSharp. You might also try R-ESRGAN 4x+ Anime6B which can give better result with anime style images.
Type: If you end up with visible seams you can try Chess, otherwise Linear should be fine.
Everything else can be left at default.

What this script is going to do is produce an upscaled image by splitting the image up into squares and working on one square at a time, upscaling and adding detail to each. To your GPU, this is just like making a bunch of small images one at a time, so it has a much easier time letting you make huge images you would not be able to otherwise.
Once everything is set, hit Generate and you should get a very nice upscaled image. Be prepared to wait a few minutes though.

You should notice added detail and definition in the upscaled image, but there will likely be areas that could still be improved. The face will be especially worth checking out.

(this and the following images were upscaled to 3072 x 2048. They have been sized down and saved as .jpg for this guide)

Inpaint

Now we'll load that upscaled version into the Inpaint tab. Make sure the Script dropdown has been set back to None. Once in the inpaint tab, you should see the exact same options from the inpaint sketch tab before.

Now, use the brush tool to mask the area to be worked on - in this case the face.

Next, ensure the inpaint settings are set as below:

Masked content: Original. Since we are enhancing what is already there and not adding something completely new.
Inpaint area: Only masked. We will be working on a small area at a time. Selecting Whole picture would mean redrawing the entire 3072x2048 picture and would take a very long time, and might not even be possible with our hardware.
Resize to: width & height 1024x1024. We are not actually resizing anything, but these dimensions are used for the inpainting area. You can make it bigger or smaller depending on the size of the whole image and the amount of detail required. The head in my image is about half this size, so in theory we should get around twice the resolution than we had before.
Denoising strength: Around 0.3 - 0.4. I have used 0.4.

Once set, you'll just need to set a prompt. In this case I found that I was able to get a good face without using a prompt at all. However, I'll use the prompt "<lora:takanelui:0.7>, takane lui, face" to ensure I'm getting an accurate to-character face. Don't forget you can set a batch size and generate as many as you need to.
Once you've got a result you like, press the Send to inpaint button on the right side of the screen, so we'll now be working on that version from now on.
Remember, img2img is an iterative process. Make a small change, send the result back into img2img, improve it further, and repeat the process.

To be fair, this kind of painted anime style was not the best example, but you should notice the difference around the eyes. You will see even better results when using more photorealistic models.

Now we've got the face, we can could go even deeper. How about the eyes and hands? Maybe some texture on the stockings? Same settings as before, except it'll only need a smaller width & height, say 512 x 512 for the face / hands, and 1024 x 1024 for the legs, and a suitable prompt for each.

Pretty happy with this result for now.

PART 2 BELOW

Clem the Gem · Jul 9, 2023

PART 2

GETTING EXACTLY THE IMAGES YOU WANT WITH CONTROLNET

ControlNet has been a total game changer for Stable Diffusion in how you can take out the randomness of image generation. Now you are not relying on only a text prompt, but can directly guide what gets drawn.

ControlNet is a large topic and I am only going to cover the specific models I have used myself, and give just a quick rundown on their use with some examples.
You will first need to have the ControlNet extension installed, and its various models downloaded. Check the Preparation section above.

Once installed, you will find the ControlNet section in a dropdown near the bottom of the UI. It will be in both txt2img and img2img, but for now we'll be working with txt2img:

1) ControlNet tabs: If you have enabled multiple ControlNets, switch between them here.
2) Input image: The image that will be processed and used in your ControlNet
3) Enable: Must be ticked for ControlNet to do anything
4) Allow Preview: See a preview of what ControlNet is doing before generating your full image.
5) Open new canvas: Create a blank canvas to use with scribble-type models.
6) Control Type: Shortcuts for setting both a Preprocessor and Model (below)
7) Preprocessor & Model: The kind of thing that you want to do with ControlNet is selected from a list of preprocessors. Then you must match the preprocessor with a suitable model. You will see the full list of preprocessors, while only those models you have downloaded will be shown.
8) Control weight: You can lower this to give less importance to the ControlNet and to follow your prompt more
9) Starting / Ending Control Step: At what stage (as a percentage of steps) ControlNet should start of stop working. For example, if you are making an image with 30 steps and set the Starting Control Step to 0.5, ControlNet will only come into effect at step 15, giving time to generate the image composition without ControlNet's influence.
10): Resize Mode: Only used when the input image size (aspect ratio) differs to the generated image. Can be used to generate completely new scenery to fill in the gaps where an image was cropped. Covered more later.

Canny Model

I'm starting with the Canny model. Enable ControlNet and Select the Canny shortcut. I'm using this image I found online as an input image:

If you tick the Allow Preview box and then press the explosion icon (Run preprocessor) you can see exactly what ControlNet is doing, and see the result of the pre-processed input image:

As you can see, the Canny model picks up the basic outline of the input image. Subject, background, whatever it can see, it will draw. Using this model, it should be pretty easy to reproduce this realistic scene but in an anime style and with a different subject.

With all the ControlNet options set, all you do now is use txt2img to create a prompt and generate an image like normal. In this example I lowered the Control Weight to 0.75 to give it a bit of leeway with the character shape, otherwise it would try to match the hairstyle too closely.

Prompt: shishiro botan, running, running track, sky, clouds, trees, tank top, shorts, sportswear,
<lora:Hololive-Botan-m:1> , grey hair, grey eyes, long hair, high ponytail, lion tail

Once you have the first image generated to get a starting point, you can then use the normal workflow to improve and upscale it as much as you want.

OpenPose Model

A lot of the time you only want to use the pose from a reference image, and don't want to capture the background or any other scenery. OpenPose is perfect for this. The preprocessor will figure out the pose from any character in the image, and provide a kind of stick man output that ControlNet will use to pose your generated image.

This time, if you select the OpenPose shortcut and check the Preprocessor dropdown, you should see there is a selection of OpenPose preprocessors to choose from. One captures just the body, another one the body and hands, the face, and so on. You can generate a preview to see how each one compares.

Using the same running woman reference picture:

Now that we have captured just the pose without anything else getting in the way, we can be more creative and make an image completely unrelated to the reference photo:

But what if you can't find a reference image of exactly the pose you want? I've got that covered in the next section!

Depth Model

The depth model guesses the depth information of the input image - how close or far different parts of the image are from the camera.
Again, there are a few different preprocessors to look at. Some are better at capturing the finer details and the background more than others.

I find this model is great for taking a scene and repainting it as something completely different, while still keeping everything in its original place.
Here the same reference image was used and only the prompt was changed each time:

Scribble Model

Now this one is really cool. Draw something on the canvas, or load something you made in MS Paint. Even the worst scribble can be turned into something amazing. Unlike img2img sketching, you do not need to colour it in for the AI to understand what it is. A black and white sketch is fine.
If you're having trouble getting good results, try lowering the Control Strength.

Prompt: 1girl, warrior, adventurer, long hair, ponytail, armor, holding sword, shield, fighting dragon, castle in background, battle, fire

Prompt: 1girl, ookami mio, sitting on fountain, outdoors, hedge, plants, cobblestone, bright sky, light beams, water, sparkle,
<lora:Hololive-Mio:0.7> , mio-cardigan, black skirt, braid, brown cardigan, wolf ears, long hair, bare shoulders, looking at viewer, smile

Tile Model

Tile is used much like the Ultimate SD Upscale script to break your image up into tyles and upscale one at a time for efficient upscaling. All you need to do is supply an image (usually the same very image you are upscaling) and select the correct preprocessor / model.

Reference Model

The Reference model is used to produce an image that look like the input (reference) image. This is similar to standard img2img, but more powerful.

Left: My input image. Let's pretend for a moment (or not) that we have no idea who this is. The idea is to get a lookalike without saying outright who it is. Right: Generated results.

Prompt: 1girl, standing, outdoors, smile, full body,
horns, white hair, long hair, black kimono, red eyes, white thighhighs, oni mask,

I found the results were not always very good with anime images. It's picked up some parts of the image, especially the background, but the character needs a fair bit of text prompting to get something that looks like the reference image.

It does a lot better with a realistic photo and checkpoint model:

Prompt: woman standing, outdoors, smile, full body,

OUTPAINTING - RESIZE & FILL IN THE GAPS

Outpainting with ControlNet lets you take an image and change its canvas size, while letting the AI fill in the blank space. For example, you can take a portrait image and turn it into a landscape, seamlessly drawing in scenery that was not there before.

You will need the Inpainting model for ControlNet (yes, we are outpainting with an inpainting model).

Set the Inpainting model and load in your image.
The only thing you need to change here is the Resize Mode. This should be set to Resize and Fill.
You then set the dimensions you want in the main txt2img settings, and enter your prompt for what you want to see.

There are some rather visible seams, but they can be fixed easily.

To fix up the seams, I simply loaded the new image into img2img / Inpaint tab. With Only Masked and Original, I painted over the seams and used the same prompt. I went over the face quickly while I was there too.

You'd never know that stuff was not there originally.

MORE ACCURATE POSING

So far, we have used reference images to generate a pose via ControlNet, but what if you can't find an image anywhere that has the pose you want?

Posemy.art Is an excellent online tool for posing 3D figures and then exporting them as an image right into Stable Diffusion. It supports OpenPose, Depth, Canny, and normal map. The images it generates are the same that the ControlNet preprocessor would generate, so you can skip the preprocess step entirely and use the ready-made image.
The free version has some limitations like saving scenes being disabled, and not being able to import a background image.

The first thing you will see is a male model. When you click on it, the bottom toolbar will appear and you can change it to a female model with the 6th icon. New models and props can be added from the toolbar at the top of the screen.

You should also see all the pose handles all over the model. You can move the limbs either by clicking on the coloured spheres and moving around the X/Y/Z axis, or by dragging on the grey box.

Move the camera angle by left clicking and dragging on an empty space. Right click drag will pan the camera. Zoom with the scroll wheel. You can use Ctrl+Z to undo when you mess up.

There are a lot of read-made poses and animations to choose from. You might be able to get the pose you want by loading up one of them and making adjustments.

Once you have your pose / scene set up, click the Export icon from the upper toolbar. From here, you can set the image size and then position the camera so your model fits within the canvas. Then export to one of the ControlNet types. For this example, you'll want OpenPose and depth.
Remember you won't be able to save this scene with the free version, so keep it open until you've completed your image in Stable Diffusion.

Now in Stable Diffusion, create your prompt as usual in txt2img and enable ControlNet. Select OpenPose as the model, but leave the preprocessor as None, since we already have our ready-made pose image that doesn't need preprocessing.

Using OpenPose, I often got result where only one arm was posed instead of both like in the pose:

Using the depth model instead, the results follow the silhouette too closely and you get hair and clothes that stick very close to the body (since the 3D model doesn't have any), or none at all:

This can be fixed by lowering the Control Weight:

OpenPose Editor

Instead of working with a 3D model (Which can be a bit fiddly), you may want to try the openpose-editor extension. This lets you pose the same kind of stick man you have seen already, but only in 2D, so it's a lot easier to move the limbs around.
The disadvantage to this is it can be harder to pose the face accurately, and since you can stretch the limbs however you want, it's easy to end up with unrealistic proportions without realising it.
One advantage however is you can load a background picture to better pose your model. You need a paid account to do this on posemy.art.
You can install it from the Extensions tab.

MULTIPLE CONTROLNETS

Take ControlNet to another level by using multiple ControlNet units at the same time! Use multiple reference images or scenes you have made yourself to get exactly the image you are imagining. You can for example use a depth map of a location as your background in one ControlNet, and then an OpenPose in another.

First you will need to enable multiple ControlNets. You can do this from the Settings tab, under ControlNet. No need to set it to the max - 3 should usually be enough. You will then need to restart Stable Diffusion.

Here I am using OpenPose for the character and a picture I found online as the background. Either Depth or Canny would work well here. I also used a third Canny ControlNet unit for just the mug.
You will note you now have multiple tabs in the ControlNet section. Simply load in the appropriate image and set the preprocessor / model, in each tab, and enable them.
You will probably find you need to lower the Control Weight of the background unit quite a bit so the character gets included. Note that the weight does not affect how accurate the pre-processed image is. You can check this by turning the weight way down and clicking preview. The weight only determines how important each ControlNet unit is in the overall image.

The settings I used here are:
Background: Depth model, depth_leress++ preprocessor, 0.4 strength
Subject: OpenPose model, no preprocessor, 1 strength
Mug: Canny model & preprocessor, 0.75 strength
Prompt: 1girl, standing, winter cabin in background, snow, trees, holding mug, steam, looking at viewer, smile, closed mouth,
<lora:Hololive-Polka:0.7> , omaru polka, fennec fox ears, blonde hair, braid, winter clothes, coat, fur trim, gloves

It's important to note all of the ControlNet images were the same dimensions. If they were not, things would not line up. I overlayed the mug over the OpenPose image in PhotoShop to get it in the right place. I could also have made the mug out of simple shapes in the posemy.art app.

It will take noticeably longer to generate the image, but you should soon have something containing all the things you asked for.

The result:

XYZ PLOT - BIG GRIDS OF IMAGES

Found at the very bottom of the txt2img and img2img screens under the Scripts dropdown.
This is very useful for showing how changing a setting affects an image. For example, to see how adjusting the denoising strength effects your image, you can generate a grid with 10 images from the same seed/prompt, starting at 0.1 strength to 1.0, with each one labelled with its strength.

Just about any value you can modify can be shown on the plot.
Full-size images will be used to make up the grid, so you can end up with some pretty massive files. I am using downsized versions in the examples below.

You can select up to 3 different things to compare.
The values you enter can be words or numbers separated by commas, or a range of numbers with or without specifying increments.

For example, if you wanted to see how the number of steps affects your image, you could enter:

X type: Steps | X values: 20, 30, 50, 60

This will produce 1 image for each of the 4 different values. The same seed will be used for each image unless you tick the box "Keep -1 for seed", so your images will be consistent.

Instead of typing each value, we can specify a range of values, and how many images we want.

X type: Seed | X values: -1--1 [4]
A little trick I often use; this will produce [4] images, starting with seed -1 (random) and ending with seed -1 (random again)

X type: Denoising | X values: 0.20-0.50 (+0.05)
This will start at 0.20, and add +0.05 as many times as needed until it gets to 0.50 (7, in this case)

Adding something to the Y type, we will get one series of images on the horizontal, and another on the vertical.

X type: Seed | X values: -1--1 [3]
Y type: Checkpoint name | Y values: (2 model names selected from the list)

Finally, with the Z type we will get a 3rd group of values, split on the vertical. Technically this just produces multiple XYZ grids and puts them side by side in one big image.

Text can be used too.
Prompt S/R (search and replace) can be used to try different text prompts. The first value will be the word or phrase in your prompt that gets replaced, and every value after it, a possible replacement (including the first value), separated by commas.

X type: Prompt S/R | X values: short hair, long hair, ponytail
Y type: Seed | Y values: -1--1 [3]
Z type: Prompt S/R | Z values: black hair, blonde hair

In this example, my prompt included the words "short hair" and "black hair". These get replaced by the variations specified above.

SETTINGS (Non-Essential)

These are some settings I recommend changing. Some can be changed from within the UI, while others will require editing a settings file.

From the Settings tab:

User interface

Quicksettings list
Here you can add a number of dropdown boxes that appear at the top of the pag. This is very useful for quickly changing settings without needing to go into the settings menu and reloading the UI.

By default, you have just the model selection box, but you can select from a huge number of settings.
Personally, I have the model selection, Clip Skip and VAE selection:
sd_model checkpoint, CLIP_stop_at_last_layers, sd_vae

Live previews

Live preview method
Set to Full. Other methods will run faster, but have been known to cause issues.

Live preview display period
You may wish to change this so you can see how your image progresses through each step. I have mine set to 5.

From the config files

Open the appropriate file with Notepad. Enter or change the values at the end of the line, then restart Stable Diffusion. These are my own personal settings. Feel free to choose your own.

\Stable-diffusion-webui\ui-config.json

Default positive and negative prompts:

"txt2img/Prompt/value": "(masterpiece:1.2, best quality:1.2, highly detailed, cinematic lighting, sharp focus, perfect face, absurdres),",
"txt2img/Negative prompt/value": "(worst quality, low quality:1.4), easynegative",

Default sampler, steps and CFG scale

"txt2img/Sampling method/value": "DPM++ 2M Karras",
"txt2img/Sampling steps/value": 30,
"txt2img/CFG Scale/value": 9.0,
"img2img/Sampling method/value": "DPM++ 2M Karras",
"img2img/Sampling steps/value": 30,
"img2img/CFG Scale/value": 9.0,

Default upscaler and upscale settings for Hires Fix:

"txt2img/Upscaler/value": "4x-UltraSharp",
"txt2img/Denoising strength/value": 0.45,
"txt2img/Upscale by/value": 1.5,

Maximum batch size:

"txt2img/Batch size/maximum": 16,
"img2img/Batch size/maximum": 16,

\Stable-diffusion-webui\config.json

Default upscaler

"upscaler_for_img2img": "4x-UltraSharp", *
"show_progress_every_n_steps": 5,
"show_progress_type": "Full",
* You will need the 4x-UltraSharp.pth file as explained in the PEPARATION section above

UPDATING

It is worth checking for new updates from time to time.

Stable Diffusion / WebUI:

If you are using the all-in-one launcher, you have the option to check for updates in the launcher.

For the regular version, you simply navigate to your stable Diffusion folder, enter "cmd" into the address bar to open a command window. In there, enter "git pull". This will check for a newer version and download the required files. The files are tiny and will take less than a second. If you want to automatically download the latest version on launch, you can edit webui-user.bat and add the same command "git pull" on a new line just below COMMANDLINE_ARGS. Personally, I would not recommend this and just check for updates yourself, as the latest updates have been known to break things before they get fixed.

Extensions

From the Extensions tab, simply select Check for Updates, and will be shown any installed extensions that have updates available. Once updated, hit Apply and Restart UI.
Once done, it's a good idea to close down the Stable Diffusion process and restart it.

ONLINE RESOURCES

https://civitai.com - All the models you could ever need, LORAs, older Hypernetworks and Textual Inversions, and also tutorials and user galleries.
https://stable-diffusion-art.com/ - Lots of in-depth tutorials, tips etc
https://app.posemy.art/ - Excellent tool for posing figures and then exporting them as an image right into Stable Diffusion via ControlNet.
GitHub: For any of the Extensions you install, there is a link to their GitHub page on the Extensions tab that often has detailed instructions on their use

YouTube channels I've found useful:

https://www.youtube.com/@sebastiankamph

https://www.youtube.com/@OlivioSarikas

https://www.youtube.com/@Aitrepreneur

Now go out there and make some nice pictures. I hope at least someone finds this guide useful.

If you followed along and got stuck or ran into anything that wasn't explained well, let me know and I'll update the guide.

Happy to answer any questions you might have.

Credit to @Tatsunoko for creating this thread and getting me interested in Stable Diffusion in the first place

Lurker McSpic · Jul 15, 2023

I've looked into it and I'm ready to jump into this autism but the only question I have for the people who are already in is this: How much does it strain your hardware?
My hardware is pretty mid, if I can chug along I don't mind but if we're talking a burn your components kind of load I might just wait to get a better pc :Vesper-UNHINGED:

Clem the Gem · Jul 15, 2023

Lurker McSpic said:
I've looked into it and I'm ready to jump into this autism but the only question I have for the people who are already in is this: How much does it strain your hardware?
My hardware is pretty mid, if I can chug along I don't mind but if we're talking a burn your components kind of load I might just wait to get a better pc

Would need to know your hardware to give a better answer, but the only thing that matters is the graphics card, and how much VRAM it has. Working on small images (512x512) shoudl be no problem, and only take a matter of seconds to render.
Once you start doing bigger images, it will be harder on your card and use all of your VRAM, but stil it should not take long to do. We're talking about seconds to minutes here, not like chugging away overnight to render a 3D scene.

I don't suppose it's any worse on your hardware than playing games all day. I've been at it for 9 months now, and sometimes spend most of the day churning out images, and nothing bad has happened yet!

Lurker McSpic · Jul 15, 2023

Clem the Gem said:
Would need to know your hardware to give a better answer, but the only thing that matters is the graphics card, and how much VRAM it has. Working on small images (512x512) shoudl be no problem, and only take a matter of seconds to render.
Once you start doing bigger images, it will be harder on your card and use all of your VRAM, but stil it should not take long to do. We're talking about seconds to minutes here, not like chugging away overnight to render a 3D scene.

I don't suppose it's any worse on your hardware than playing games all day. I've been at it for 9 months now, and sometimes spend most of the day churning out images, and nothing bad has happened yet!

I see, thanks for the reply! :kaelaapprob:

I'm more inclined to delve into cloning voices using rvc and the info I looked at recommended at least 8gb of vram(I have 6gb) and said that it can work with less but the training is what could do me in I guess. I'll give it a shot I don't think I'm cursed enough to burn my card on a first try :kaelalaugh:

Faceless Waifu · Jul 15, 2023

I also decided to give it a try after reading the full guide, but as a peasant who uses laptop rather than PC, the thing basically couldn't handle any sort of generations despite tweaking it to the lowvram setting (It is able to run it along with the needed UI, but when I tried to generate something... laptop ded). I guess that my NVIDIA card is just too old (Geforce 920MX) to handle the generation aspect of StableDiffusion.

That said, I did find this anime-focused AI generating site/place thingie called PixAI, so I can give it a try there. Maybe not as good as StableDiffusion... but it's free (I think).

Watamate · Jul 15, 2023

Lurker McSpic said:
I see, thanks for the reply!
I'm more inclined to delve into cloning voices using rvc and the info I looked at recommended at least 8gb of vram(I have 6gb) and said that it can work with less but the training is what could do me in I guess. I'll give it a shot I don't think I'm cursed enough to burn my card on a first try

I'm on mobile so I'll probably edit this later.

But your best bet honestly for voice cloning is just using a Google Colab. That way you don't even really have to use your own hardware, and your only limitation is the amount of runtime Google gives you for free.

But before that you'll probably want to get your files ready and I have a few tips for that.

I'd start off by choosing and downloading a chatting stream of whoever you want to clone.

Remove the intro and outro with the video editor of your choice and export just the audio. (which should be an option)

Remove any background audio/music with UVR5, which is arguably one of the better vocal isolators currently available. https://github.com/Anjok07/ultimatevocalremovergui

Split your isolated vocal mp3 into 10 second audio clips. I used Audacity for this but there might be better alternatives for this.

Once you've finished all that you can finally get around to using a Colab to start cloning the voice you want, using https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI (other alternatives are available)

Once you reach that point you can also watch this video.

Scoots · Jul 15, 2023

Clem the Gem said:
Posting this now even though it's full of imperfections, since I've been working on this one image for days now and am getting sick of looking at it.
Composable LORA is now working with Latent Couple (technically, the Extensions list changed to a different, working branch of the extension) which means I could try making something I wanted to for a while.

Full version 5184 x 1728:

View attachment 33632

Working on high res images has been an exercise in frustration, causing the UI to grind to a halt, image generation either failing or taking a stupid long time, having to restart the UI and start all over, not to mention not being able to use the PC for anything else while it's working...

I ended up with more than 200 images that went into making this. Around 350 if you count the first version before I started over again.

I started by creating a scene in https://app.posemy.art/ to get a depth map and openpose to use in ControlNet to generate the first image

View attachment 33617
View attachment 33616

Since the models are all bald, I initially had quite a bit of trouble with getting the correct hairstyles in my images. I also completely forgot to adjust their height. I think the better way would have been to depth map the letters only, and rely on just the pose for the characters.

I used Latent Couple with Composable LORA enabled to mark out the areas and prompt one character in each area:

View attachment 33618

This is the first image that I settled on. I started at 1536 x 512 and is ugly as shit, but that's to be expected. I don't know what's going on with Polka's E, but it gave me that more than once. I did think about keeping it:

View attachment 33613

Hires Fix at 1.5 scale (2304 x 768) improved it a bit:

View attachment 33614

Sent to img2img and upscaled to 3456 x 1152:

View attachment 33620

Now this is where it feels like I was cheating. I spent a long time trying to upscale this image while maintaing the characterss' poses and clothing, but I was either running into memory errors, having the process stall, or just getting bad results. So what I did is use inpainting to just re-draw every character one at a time, many times until I got a good result. ControlNet took care of keeping them in the same position.

Inpainting was set to inpaint only the masked character, "Masked content: Original", "Inpaint area: Only masked", and the width x height set to 768 x 768. This means that it is still looking at the surrounding picture to redraw in a way that makes sense, but it's only working on a small area which is much more manageable and doesn't just error out.

Inpainting introduced lots of little imperfections around the letters and the horizon especially, and the faces were still not very detailed, so a lot of time was spent fixing them up, one at a time.

View attachment 33630

I had to compromise here and just accept the results it gave me, since I could not get the exact clothing I wanted some of the time. Same goes for getting their tails. But overall, I'm pretty pleased with it. Hag Love.

That time lapse is an immensely satisfying watch

Faceless Waifu · Jul 22, 2023

After spending I guess a week with PixAI, I can say that it's quite a decent way for someone who want to dip their toes into AI art generations but aren't able to set it up on their hardware. It's using StableDiffusion (from what I gathered), so I can easily follow most of the earlier Tutorial made by Clem the Gem above. Now I don't really know about website-based AI generation sites, but PixAI at least advertise itself as "free", and it is free. With just a simple account, you can just get on to generate your prompt right there and then.

Though the need of using credits for almost everything like generating, upscaling, editing, etc is expected (they're free, but doesn't mean it's "fully free" unlike working/generating on your own computer), but the costs itself is not exactly too big and you can reduce it at the cost of certain limits being imposed, like say, not being able to use certain Sampling Methods or even not being put in high priority queue and thus had to wait for 30 minutes (or more!) when generating a batch/image. But with 10k credits daily (sometimes upped to say, 12k if they're doing promotion via their mobile phone app or hosting AI art competitions in their discords), you can save up until you feel like it's time to make something.

All in all, good enough if you just want to dip your toes and try making AI art without figuring out what to do or whether your hardware can handle it, but the credits system and that it doesn't have certain things (It doesn't allow you to add LORA or Model locally unless you upload it to their own model/lora market, so you can't use stuff from CivitAI unless it has been reposted there, for example) that might be limiting if you are more accustomed toward generating AI art on your own hardware.

Here's what I generated so far (that are vtuber based, most of it are just Taimanin Yukikaze art... don't ask why)

I haven't tried any other interesting stuff they have available like ControlNet and any other 'advanced' stuff that I see in the tutorial above, nor do I have done other necessary stuff like upscaling the result, but it's nice that I'm finally making what I want to see via this AI stuff.

Clem the Gem · Jul 22, 2023

Faceless Waifu said:
After spending I guess a week with PixAI,....

I just had a look at this for the first time, and it looks like it would be a good alternative for anyone who can't run Stable Diffusion locally for whatever reason, and you can put up with the credit system (and potentially having your stuff seen by others).

It's got all the usual settings you can change, loads of models to choose from (recognised most of them) as well as LORAs including all the Hololives that I've used before, and even all the ControlNet models and region control too.

I tried creating an image locally and recreating it using the site, but was not able to. Unfortunately since the final downloaded image can't be loaded into PNG info, I couldn't check where exactly it differed, and there are too many variables that can change the outcome of an image. The images weren't better or worse. Just different.

AI Waifus and You 101

Well-known member

Previously known as Tatsunoko

Unknown member

Moon (sugar) and Star

Previously known as Tatsunoko

Moon (sugar) and Star

Well-known member

Unknown member

Well-known member

Unknown member

Unknown member

PART 1​

INTRO​

TERMINOLOGY​

HARDWARE REQUIREMENTS​

INSTALLATION​

New super-easy way​

The original way​

Running on lower spec cards​

PREPARATION​

​

Models​

Textual Inversions​

4x-Ultrasharp Upscaler​

VAE​

Clip Skip​

Extensions​

Essential​

Useful / QOL​

To look at later (not covered in this guide)​

NAVIGATING THE UI​

TYPICAL WORKFLOW​

LET'S START USING THIS THING​

MORE ON PROMPTING​

HIRES FIX​

DRAWING EXISTING CHARACTERS, USING LORAs​

PNG INFO​

ALTERING IMAGES WITH IMG2IMG​

Upscaling & Redrawing​

Sketch​

Inpaint Sketch​

UPSCALING & ADDING DETAIL​

Ultimate SD Upscale​

Inpaint​

PART 2 BELOW​

Unknown member

PART 2​

GETTING EXACTLY THE IMAGES YOU WANT WITH CONTROLNET​

Canny Model​

OpenPose Model​

Depth Model​

Scribble Model​

Tile Model​

Reference Model​

OUTPAINTING - RESIZE & FILL IN THE GAPS​

MORE ACCURATE POSING​

OpenPose Editor​

MULTIPLE CONTROLNETS​

XYZ PLOT - BIG GRIDS OF IMAGES​

SETTINGS (Non-Essential)​

From the Settings tab:​

User interface​

Live previews​

From the config files​

\Stable-diffusion-webui\ui-config.json​

Default positive and negative prompts:​

Default sampler, steps and CFG scale​

Default upscaler and upscale settings for Hires Fix:​

Maximum batch size:​

\Stable-diffusion-webui\config.json​

Default upscaler​

UPDATING​

Stable Diffusion / WebUI:​

Extensions​

ONLINE RESOURCES​

If you followed along and got stuck or ran into anything that wasn't explained well, let me know and I'll update the guide.​

We need to increase the hag population

Unknown member

We need to increase the hag population

PART 1

INTRO

TERMINOLOGY

HARDWARE REQUIREMENTS

INSTALLATION

New super-easy way

The original way

Running on lower spec cards

PREPARATION

Models

Textual Inversions

4x-Ultrasharp Upscaler

VAE

Clip Skip

Extensions

Essential

Useful / QOL

To look at later (not covered in this guide)

NAVIGATING THE UI

TYPICAL WORKFLOW

LET'S START USING THIS THING

MORE ON PROMPTING

HIRES FIX

DRAWING EXISTING CHARACTERS, USING LORAs

PNG INFO

ALTERING IMAGES WITH IMG2IMG

Upscaling & Redrawing

Sketch

Inpaint Sketch

UPSCALING & ADDING DETAIL

Ultimate SD Upscale

Inpaint

PART 2 BELOW

PART 2

GETTING EXACTLY THE IMAGES YOU WANT WITH CONTROLNET

Canny Model

OpenPose Model

Depth Model

Scribble Model

Tile Model

Reference Model

OUTPAINTING - RESIZE & FILL IN THE GAPS

MORE ACCURATE POSING

OpenPose Editor

MULTIPLE CONTROLNETS

XYZ PLOT - BIG GRIDS OF IMAGES

SETTINGS (Non-Essential)

From the Settings tab:

User interface

Live previews

From the config files

\Stable-diffusion-webui\ui-config.json

Default positive and negative prompts:

Default sampler, steps and CFG scale

Default upscaler and upscale settings for Hires Fix:

Maximum batch size:

\Stable-diffusion-webui\config.json

Default upscaler

UPDATING

Stable Diffusion / WebUI:

Extensions

ONLINE RESOURCES

If you followed along and got stuck or ran into anything that wasn't explained well, let me know and I'll update the guide.