RIN
horny raccoon
autistic talentless hack
back to blog
0. interseed variability, composition, and poses
your composition is boring.
this guide is about square brackets and why they are the single most important thing for making interesting character image generations. this guide also assumes you're using forge, swarmui, or some online generator like civitai that supports square brackets. if you're using comfy there's a node. if you're one of those lovely frosting or perchance people you're currently out of luck. maybe if you yell at the people taking your money they'll enable this and actually let you make something decent for once. ha ha i am alienating people with the same hobby as me. anyway. firstly and loudly: square brackets are not the same as parenthesis, okay? they are also not the opposite of parenthesis. i don't care what google tells you. this feature is overwhelmingly misdocumented so don't give me that shit. look.
it's wrong.
here's the real deal: [emp]square brackets manipulate the step at which the contents enter the generation.[/emp] go back and read it again. thank you. here's what it does: [code][prompt::0.2] do prompt for the first 20% of steps, then remove prompt. [prompt:0.2] do nothing for the first 20% of steps, then add prompt. [thing:other:0.2] do thing for the first 20% of steps, then remove thing and add other. [lots|of|stuff] do lots for one step, then do of for one step, then do stuff for one step, then repeat. [/code] right. here's the pitch in two images. {prompt 1}Dark background, closeup, Furry male, deer, antlers, athletic, muscles, pectorals, skin-tight black sleeveless jersey unzipped half way, covering his face with his hands, crying, sobbing, bawling{/popout} | {prompt 2}[landscape::0.2] [Dark background, closeup, Furry male, deer, antlers, athletic, muscles, pectorals, skin-tight black sleeveless jersey unzipped half way, covering his face with his hands, crying, sobbing, bawling:0.2]{/popout}
normal prompting | square brackets. or me browsing literally any ai image hosting site.
the first images are boring. they are the same. they are the most basic and ideal interpretation of the contents of the prompt. the second images, though. interseed variability: phenomenal. poses: expressive, interesting. other notes: we got some weirdness in one of the images by way of our helpful muscular emotional support himbo. it's worth it though. one more example and then i'll explain the what and why. {prompt 1} Detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat, small, anthro, chest fur, tank top, ULA baseball hat, half closed eyes, thick whiskers, mean look, mouth open{/popout} | {prompt 2} [wind, frog, hell::0.2] [Detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat, small, anthro, chest fur, tank top, ULA baseball hat, half closed eyes, thick whiskers, mean look, mouth open :0.2]{/popout}
normal prompting | square brackets
left: girl right: a whole scene okay, let's talk about prompting. consider the prompt: [code]furry, lizard humanoid, thighs, yellow eyes, black hair, ponytail, ass, black jacket, shorts[/code] when you type that prompt and press the give me things button, the image starts generating and your model latches onto those words. here, i did it for you 4 times: {prompt} furry, lizard humanoid, thighs, yellow eyes, black hair, ponytail, ass, black jacket, shorts{/popout}
i am so bored.
we got some fairly generic images of a woman doing a pose that contains prominently all those prompted things. a low angle because of course, thighs. facing the camera because of course, smug. the character fills the frame. the image is mostly ass because of course, ass. if you've done this long enough you've seen hundreds if not thousands of these images. you've upvoted these images. i'm not judging you. she's cute, it's fine. it's perfectly great at doing what it's supposed to do and that's enough for a lot of people. but maybe you think that’s boring. i do. maybe you prompt some danbooru tags for like, spread legs or bent over, yeah. also beach because beach. also eye contact makes you uncomfortable so you prompt looking away. you're clever. you're an artist. your hidden hentai folder in 2016 was 15gb so you're kind of an expert, actually. {prompt}furry, lizard humanoid, thighs, yellow eyes, black hair, ponytail, ass, black jacket, shorts, crossed legs, beach, looking away{/popout}
oh good, beach.
it's still samey and kind of boring. she’s looking away but for some reason almost every image we generate has her looking away in the exact same way. there's a beach in the background but there might as well not be for how little she's interacting with it. so you add more tags.. [emp]no. [/emp] [emp]stop.[/emp] this is using image generation tools like they're a danbooru/e621 search bar. this is bare minimum effort and it's going to give you 'ai image' results. here's what's happening: you’re starting with noise, and that noise is random, sure, but it’s like a boring even sort of random that’s uniform and samey and those first steps always look the same. your model goes: where is my ass, thighs, face, hair, i need them now because that’s what i’m told is important and i know very well what those words mean because i’ve been trained on tens of thousands of asses. your character description and orientation is defining your entire image composition from step one. it's dumb little non-brain is like 'i need all of these things right now' and it starts making a girl shaped blob in a way that can plausibly fit everything. and that blob ends up very similar every time because that’s what its supposed to do. sure they’re kind of different, as there exists different local maximums of how to ideally create an image that has both prominent ass and prominent thighs while still having face visible. you might fish through half a dozen seeds looking for the best one but they’re all the same in the way that matters. the result is compositionally uninteresting images where the only feature is a perfectly lit and centered character containing all the prominent prompt items. let's try square brackets. [code][furry, lizard humanoid, thighs, yellow eyes, black hair, ponytail:0.2][/code] when we wrap the prompt, the image generation process goes a little bit different. for the first 20% of the image, you are generating a completely random image with zero prompt. which (depending on your model) is likely much more diverse than what the model would produce if it started with random static. the contents of the square brackets are withheld from image generation as the model churns up something. then, at 20% of the way through the steps, your model takes that random weird blobby image and uses that as a starting point for identifying where the character information exists. so it goes, metaphorically, oh, this is weird, where can i put thighs, maybe here? can i even fit eye color into this image? what does a ponytail even look like from this strange orientation? which are all interesting questions that your model can usually solve, it just never happens because it’s given boring noise to start with. here, four times. square brackets on the right. i'm gonna cheat a little and we'll talk about it later. {prompt 1} furry, lizard humanoid, thighs, yellow eyes, black hair, ponytail, ass, black jacket, shorts, crossed legs, beach, looking away{/popout} | {prompt 2}[landscape::0.2] [furry, lizard humanoid, thighs, yellow eyes, black hair, ponytail, ass, black jacket, shorts, crossed legs, beach:0.2]{/popout}
normal prompting | square brackets
my eye says the four on the left are basically the same image. you could weight each image into regions labeled: ass, jacket, hair and every image would have matching weights. the four on the right do not have that problem. here's more. {prompt} [blizzard::0.2] [furry, lizard humanoid, thighs, yellow eyes, black hair, ponytail, ass, black jacket, shorts, crossed legs, beach, looking away:0.2]{/popout}
diverse blizzard lizard.
to summarize so far: we delayed the addition of character information so we get diverse outputs. the back half of the image is where those small character details are solidified. the first half is an amorphous blob that’s vaguely girl shaped. without square brackets, all of those things that are important to how your character looks are also actively defining how the image is composed. they’re deciding the pose/orientation/composition of the whole image before they even matter. with square brackets, we randomize or build the composition then add the character. here's baseball cat 40 times to make my point. {prompt 1} Detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat, small, anthro, chest fur, tank top, ULA baseball hat, half closed eyes, thick whiskers, mean look, mouth open{/popout} | {prompt 2} [landscape::0.2] [Detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat, small, anthro, chest fur, tank top, ULA baseball hat, half closed eyes, thick whiskers, mean look, mouth open :0.2]{/popout}
normal prompting | square brackets
stop putting character and pose shit in the beginning steps. use square brackets to build composition and then introduce your character to it later. alright. little detour here, because you might've noticed the square bracket images had [landscape::0.2] and [blizzard::0.2] and you're probably screaming at your screen right now. let's dig into this a bit for the sake of understanding: [code][wind, frog, hell::0.2] [Detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat, small, anthro, chest fur, tank top, ULA baseball hat, half closed eyes, thick whiskers, mean look, mouth open :0.2][/code] so at first, we're dealing with the prompt: wind, frog, hell the rest of the prompt is delayed until 20% of the way through the generation. if we cancel the image before then, it looks like this: {prompt}[wind, frog, hell::0.2] {/popout}
rin that is a frog. wtf are you doing i am closing the tab.
okay wait. at 20%, the starting square bracket drops out and the character is added in, so the model is forced to turn the frog into the character. {prompt}[wind, frog, hell::0.2] [Detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat, small, anthro, chest fur, tank top, ULA baseball hat, half closed eyes, thick whiskers, mean look, mouth open :0.2]{/popout}
see. girl. there was a girl in the frog.
click the image to cycle. look how the end image of the character borrows elements from the frog. this image is frankly unpromptable. this pose is unpromptable. if you take all the elements of this image and stick it in a prompt without square brackets you are going to get something very different and very boring and it's going to look extremely similar every single time. but with a fairly simple prompt, it's cool. and 'frog' can be literally anything you can imagine. it can be a whole prompt that has utterly nothing to do with the end image and only exists to vaguely define colors and layout. more features. you ever get an image where it’s like damn this is perfect but her tits are too big? of course you don't, but try to imagine. you lock the seed and change the prompt, but this upsets the whole image and you get something completely different out of the generation. that’s because changing the titty tag is changing the initial steps of the image and messing with the whole composition. instead, we lock the seed, but now modify the prompt to turn off the big tiddies near the end, or switch big tiddies to responsible ones. [code][big tiddies:modest tiddies:0.8][/code] this retains big tiddies at the beginning steps, which means the image will be largely the same compositionally until later in the generation process when it switches to modest tiddies at 80%. what else can we do? change the facial expression at the end of the generation without affecting the entire image. [code][smug:0.7][/code] you want the implication of butthole without it being absurd and central to the image? [code][(anus:0.6):0.7][/code] different shirt? take off her pants? easy, dude. [code][yellow sweater, bottomless:0.6][/code] maybe you want ass in the image but you're not enthused about how large the 'ass' tag is making it? turn it off. [code][ass::0.5][/code] eye color. [code][yellow eyes:blue eyes:0.7][/code] and lots of other things. let's play with these ideas: {prompt 1}[bbq::0.2] [detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat , Small, anthro, chest fur, white hair, yellow eyes, white tank top ,ULA baseball hat , half closed eyes, thick whiskers, mean look, mouth open, beach :0.2] {/popout} | {prompt 2} [bbq::0.2] [detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat , Small, anthro, chest fur, white hair ,ULA baseball hat , half closed eyes, thick whiskers, mean look, mouth open, beach :0.2] [[yellow eyes, white tank top:0.2]::0.3] [pink eyes, hoodie, bottomless:0.3]{/popout}
eyes, shirt, pants.
once more. seed still locked. {prompt 1}[bbq::0.2] [detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo,cat anthro, fluffy fur, white body ,furry, small breasts, baseball bat , Small, anthro, chest fur, white hair, yellow eyes, white tank top ,ULA baseball hat , half closed eyes, thick whiskers, mean look, mouth open, beach :0.2]{/popout} | {prompt 2} [bbq::0.2] [detailed, Best Quality, Masterpiece, Ultra HD, High Res, hi res, best quality, detailed Background, sitting, 1girl, solo, baseball bat , Small, anthro, tank top, furry, lizard humanoid, green scales, thighs, yellow eyes, black hair, ponytail, black jacket, shorts ,ULA baseball hat , half closed eyes, mean look, mouth open, beach :0.2]{/popout}
lizard.
very similar composition, new character. the character description can be anything and the image is going to turn out largely the same because we delay the character. we can drop in characters even easier if we delay the character information further, like 30-40%. with this, you can start to actually pay attention what the model is prioritizing during generation and add or remove weight when it actually matters to the image you want. there is a lot of potential here. you can expand on this in ways that are important to you, specifically. when you do, send me a message and show me your cool shit. extra notes: the weights in this guide work for euler a. other samplers behave differently but still work. you’ll likely need to increase CFG in order to get the same quality as a non-delayed image. the consequence of more variability is that you may get occasional generations that are horrible SD1.5 esque eldritch abominations of pussy and arm. the more you delay the character information the more likely this becomes. thanks.
previous guide
back to blog
next guide
×
‹
›