AI image generation

Generate sprites, backgrounds, and animation frames from a text prompt — without leaving the editor.

Umicat ships AI image generation built into the asset panel. You describe what you want, pick a style anchor (so multiple assets look like they're from the same game), set the aspect ratio, and a few seconds later the asset lands in your library.

When to use AI generation

You don't have art assets, and the library doesn't quite have what you want.
You need a sprite that doesn't exist — a wizard with a banana, a spaceship shaped like a frog, an enemy you invented.
You want stylistic consistency across multiple sprites in one game (use style-reference anchoring — see below).
You need an animation frame that visually continues from existing frames in a spritesheet (use the Sheet Editor's Regenerate / Add Frame flows).

Generate a new asset

From the Assets tab: + Add → Generate with AI ✨. The modal asks:

Prompt — describe what you want. Be specific: subject, pose, background (or "transparent background"), style adjectives. Example: "A small pixel-art wizard in a blue robe, facing the camera, idle pose, transparent background."

Modality — Image or Audio (audio coming soon).

Style references — multi-select grid of existing in-game image assets. Pick up to 5; they're sent as anchors so the new asset matches the look. Critical for consistency across a game.

Aspect ratio — 1:1 / 16:9 / 9:16 / 4:3 / 3:4.

Transparent background — when on, runs a 2-pass matting pipeline on the server to cut the background cleanly. Recommended for sprites you'll composite over scene backgrounds. Off works better for full scene backgrounds, posters, and full-bleed art.

Filename (optional) — a custom stem. Leave blank for an auto-generated name.

Click Generate. The modal stays open with a spinner; the asset appears in the library on success (5–30s typical).

If transparency matting fails on a difficult subject, you'll see: "Couldn't auto-cut the background — try again, or untick Transparent background to keep the original." — exactly the right call.

Style anchoring for consistency

The biggest practical tip: pick style references that share a visual language. Two patterns work:

First asset is the anchor. Generate your hero character first. Every subsequent asset references the hero. All your sprites will share its palette, line weight, and rendering style.
Pick a small reference set (3–5 assets) that captures the look of your game and reuse it as the anchor for every new generation.

Without anchoring, two consecutive generations from the "same" prompt will produce visually inconsistent results (different palettes, different detail levels, different shading conventions).

Regenerate or add an animation frame

In the Sheet Editor:

Regenerate ✨ in the frame inspector — rebuilds the selected frame with the same prompt + adjacent frames as style anchors. Useful when one frame in a walk cycle is slightly off.
+ Add frame in the timeline — appends a new frame, pre-filled with the last frame's prompt and last 4 frames as style anchors.

These flows route through gameAPI.regenerateAsset (in-place rebuild) or the normal generate endpoint (new asset). Either way, the frame sequence keeps its position-by-position integrity.

Asking the agent to generate

The agent has the generate tool too. Two patterns:

Direct generate request

Generate a pixel-art coin sprite, 32x32, transparent background.
Use the same style as the hero asset.

The agent calls the generate tool with your prompt and a sensible style anchor (usually the hero or the most recently generated asset).

Library suggestion → fall through to generate

When you ask for "a wizard" and the library has nothing close, the agent's library_suggest flow will offer three options:

Search the library yourself.
Upload your own.
Generate a new one with AI.

Pick the third and the Generate Asset modal opens with your context. On close, the agent receives "I generated foo.png via the Assets tab. Please use it." and wires it in.

Cost

AI image generation is metered through the same credit pool as agent turns. A single generation typically costs 3–8 credits depending on prompt complexity, style-reference count, and whether transparency matting runs.

You'll see the exact cost in Settings → Credit Usage → Deductions table — image-generation deductions are tagged separately from regular turn deductions.

Quality limits and workarounds

The underlying generation model is excellent at pixel art, decent at flat-shaded vector / cartoon, and weaker at:

Photo-realistic faces — avoid.
Text inside images — the model can't reliably spell. Add text in HUD widgets, not inside the sprite.
Highly specific anatomy — generic body plans work better than "the character has exactly 6 fingers."
Long character names embedded as labels — don't.

Workarounds:

For pixel art quality: lean into the style. "pixel art", "16-bit", "Sprout Lands style" all anchor the model well.
For icons / UI: generate at the largest size you'll need, then downscale (the Asset Detail panel's Resize works for this).
For transparent edges: try the same prompt with transparent background ON first. If it fails, generate with a solid background and use Tag regions to crop manually.

Not yet supported

Audio generation — coming.
3D models — out of scope (Umicat is 2D-only).
Editing an existing image — generate a new one referencing the existing as a style anchor instead.