How to Build a LoRA Training Dataset from Scratch (2026)

The quality of your LoRA comes almost entirely from the quality of your dataset. A mediocre trainer running a well-built dataset will produce better results than a perfectly configured Kohya run on noisy, inconsistently tagged images.

This guide covers the full dataset pipeline: image count, selection criteria, resolution, folder structure, tagging strategy, and the specific mistakes that show up in bad LoRA output.

How Many Images Do You Actually Need

The right number depends on the type of LoRA you are training.

Character LoRA: 15 to 30 images is the sweet spot. You want enough variety in pose, expression, and lighting that the model learns the character rather than memorizing specific training images. More than 50 adds diminishing returns unless the character has many distinct visual states.

Style LoRA: 50 to 150 images. Art style is more distributed across images than a character, so you need more examples to capture the full range.

Concept or object LoRA: 10 to 25 images. You are teaching the model what a specific thing looks like, which requires less variety.

You can train a usable LoRA with as few as 10 images. Fewer than that and the model overfits to specific images rather than learning the general concept.

Image Selection Criteria

Consistency of the subject. For a character LoRA, every image should contain that character, ideally filling most of the frame. Cut images where the character is in the background or partially obscured.

Variety of everything else. Poses, angles, expressions, lighting, backgrounds. If all your training images show the character from the front with the same expression, the LoRA will resist generating anything else. Include full-body shots, close-ups, and different poses.

Visual quality. Blurry images and heavy compression artifacts degrade output. Use the sharpest, highest-resolution images you can find.

No text overlays, watermarks, or UI elements. The model will learn these too.

Resolution and Cropping

Resolution requirements by base model:

SD 1.5: 512x512 pixels. Most trainers also support aspect ratio buckets (512x768, 768x512).
SDXL and SD 3.5: 1024x1024. Aspect buckets are common.
Flux: 1024x1024 minimum. Handles non-square resolutions well.

For cropping, the goal is to center the subject and remove irrelevant background without awkward cuts. Birme handles batch resizing in the browser. ImageMagick is faster for large datasets via CLI.

Manual review after cropping is worth the time. Automated crops get a percentage wrong.

Folder Structure

Kohya_ss and most trainers expect this structure:

dataset/
  10_my_trigger_word/
    image001.png
    image001.txt
    image002.png
    image002.txt

The folder name format is [repeat_count]_[trigger_word]. The repeat count controls how many times each image is seen per epoch. A value of 10 means each image is seen 10 times per epoch.

Each image has a matching .txt file with the same base name containing the caption for that image.

Trigger Words and Caption Strategy

The trigger word is a unique token prepended to every caption. The LoRA learns to associate it with your subject.

Choose something specific and uncommon. my_char_v1 works better than character. tide_style_lora works better than style. Generic words already exist in the base model's training data, so your LoRA competes with existing associations instead of building a clean new one.

A caption file looks like this:

my_char_v1, 1girl, brown hair, blue eyes, school uniform, smile, standing, outdoors, sunlight

Trigger word first, then descriptive tags.

What to tag: Everything you want the LoRA to control, like hair color, eye color, outfit, expression, body type.

What to skip: Specific backgrounds, generic lighting, camera angles. The less you tag these, the more flexible the LoRA in different contexts.

The Danbooru tag wiki is the reference for tag conventions. If you want a fast way to generate a starting set of tags, the free browser tagger I built runs WD14 inference without any local setup.

Common Mistakes That Ruin Results

Training on too-similar images. Ten images that are slightly cropped versions of the same source will produce a LoRA that only generates that one pose. Variety is not optional.

Inconsistent tagging. If some images tag the hair color and some do not, the model receives inconsistent signals. Decide which attributes you are tagging and apply them to every image.

Over-tagging backgrounds. If every caption includes the specific background scene, the model associates your trigger word with that background. Tag backgrounds only if they are part of what you want the LoRA to produce.

Training too long. More epochs is not better. Most character LoRAs reach a good result between 1,500 and 3,000 steps for SD 1.5. Beyond that, the LoRA overfits and loses generalizability.

Skipping weight testing. After training, test the LoRA at a few different weights (0.6, 0.8, 1.0) before deciding the run is complete. The optimal weight is almost never 1.0.

Starting Point Recap

Collect 15-30 images for character, 50-100 for style
Crop and resize to target resolution for your base model
Tag each image: unique trigger word + descriptive Danbooru tags
Organize into the repeat_count_trigger_word folder structure
Train with conservative settings, test at multiple weights

The dataset step is slower than training. That is the right trade-off. A better dataset produces better results regardless of training parameters.

How to Build a LoRA Training Dataset from Scratch (2026)

This guide covers the full dataset pipeline: image count, selection criteria, resolution, folder structure, tagging strategy, and the specific mistakes that show up in bad LoRA output.

How Many Images Do You Actually Need

The right number depends on the type of LoRA you are training.

Style LoRA: 50 to 150 images. Art style is more distributed across images than a character, so you need more examples to capture the full range.

Concept or object LoRA: 10 to 25 images. You are teaching the model what a specific thing looks like, which requires less variety.

You can train a usable LoRA with as few as 10 images. Fewer than that and the model overfits to specific images rather than learning the general concept.

Image Selection Criteria

Visual quality. Blurry images and heavy compression artifacts degrade output. Use the sharpest, highest-resolution images you can find.

No text overlays, watermarks, or UI elements. The model will learn these too.

Resolution and Cropping

Resolution requirements by base model:

SD 1.5: 512x512 pixels. Most trainers also support aspect ratio buckets (512x768, 768x512).
SDXL and SD 3.5: 1024x1024. Aspect buckets are common.
Flux: 1024x1024 minimum. Handles non-square resolutions well.

For cropping, the goal is to center the subject and remove irrelevant background without awkward cuts. Birme handles batch resizing in the browser. ImageMagick is faster for large datasets via CLI.

Manual review after cropping is worth the time. Automated crops get a percentage wrong.

Folder Structure

Kohya_ss and most trainers expect this structure:

dataset/
  10_my_trigger_word/
    image001.png
    image001.txt
    image002.png
    image002.txt

The folder name format is [repeat_count]_[trigger_word]. The repeat count controls how many times each image is seen per epoch. A value of 10 means each image is seen 10 times per epoch.

Each image has a matching .txt file with the same base name containing the caption for that image.

Trigger Words and Caption Strategy

The trigger word is a unique token prepended to every caption. The LoRA learns to associate it with your subject.

A caption file looks like this:

my_char_v1, 1girl, brown hair, blue eyes, school uniform, smile, standing, outdoors, sunlight

Trigger word first, then descriptive tags.

What to tag: Everything you want the LoRA to control, like hair color, eye color, outfit, expression, body type.

What to skip: Specific backgrounds, generic lighting, camera angles. The less you tag these, the more flexible the LoRA in different contexts.

The Danbooru tag wiki is the reference for tag conventions. If you want a fast way to generate a starting set of tags, the free browser tagger I built runs WD14 inference without any local setup.

Common Mistakes That Ruin Results

Training on too-similar images. Ten images that are slightly cropped versions of the same source will produce a LoRA that only generates that one pose. Variety is not optional.

Inconsistent tagging. If some images tag the hair color and some do not, the model receives inconsistent signals. Decide which attributes you are tagging and apply them to every image.

Training too long. More epochs is not better. Most character LoRAs reach a good result between 1,500 and 3,000 steps for SD 1.5. Beyond that, the LoRA overfits and loses generalizability.

Skipping weight testing. After training, test the LoRA at a few different weights (0.6, 0.8, 1.0) before deciding the run is complete. The optimal weight is almost never 1.0.

Starting Point Recap

Collect 15-30 images for character, 50-100 for style
Crop and resize to target resolution for your base model
Tag each image: unique trigger word + descriptive Danbooru tags
Organize into the repeat_count_trigger_word folder structure
Train with conservative settings, test at multiple weights

The dataset step is slower than training. That is the right trade-off. A better dataset produces better results regardless of training parameters.

How to Build a LoRA Training Dataset from Scratch (2026)

How to Build a LoRA Training Dataset from Scratch (2026)

How Many Images Do You Actually Need

Image Selection Criteria

Resolution and Cropping

Folder Structure

Trigger Words and Caption Strategy

Common Mistakes That Ruin Results

Starting Point Recap

Need a senior web developer?

How to Build a LoRA Training Dataset from Scratch (2026)

How to Build a LoRA Training Dataset from Scratch (2026)

How Many Images Do You Actually Need

Image Selection Criteria

Resolution and Cropping

Folder Structure

Trigger Words and Caption Strategy

Common Mistakes That Ruin Results

Starting Point Recap

Need a senior web developer?