# Generating Videos

Turn your images and references into short video clips using the in-editor assistant, which animates a still frame (image-to-video) or composes motion from several reference images and videos (reference-to-video). Every video generation asks for explicit confirmation first, because video costs scale with length.

## Animate an image (image-to-video)

The assistant's `generate_image_to_video` tool animates one image into a clip, with an optional end frame.

To animate an image:
1. Put the image on your canvas and select it so it becomes a reference for the assistant (it is numbered as `@Image1`, `@Image2`, …).
2. Ask the assistant to animate it and describe the motion you want (e.g. "make @Image1 slowly pan, camera drifting left").
3. Pick a model when you have a preference (see Models below). For a smooth start→end transition, also select a second image as the end frame — only supported on **Kling v3 Standard** and **Seedance 2**.
4. The assistant replies with a one-click **confirmation** showing the estimated credit cost. Generation only starts after you confirm (see Confirmation below).
5. The clip appears in the conversation when it finishes — typically 30–120s.

Pointer: the resulting video lands in your video library and on the canvas, where it can be selected as a reference for reference-to-video or merged with other clips.

## Use references (reference-to-video)

The `generate_reference_to_video` tool composes a clip from up to **9 reference images** and up to **3 reference videos** plus a prompt (Seedance 2 Reference).

To use references:
1. Select the images and/or videos you want on the canvas. They become assistant references, numbered `@Image1…@Image9` and `@Video1…@Video3`.
2. Address them in your prompt by their tokens — e.g. "use the wardrobe from @Image1 and the camera motion from @Video1".
3. Reference videos guide motion and style; reference images supply identity, wardrobe, environment, and atmosphere.
4. Confirm the cost preview to start (see Confirmation).

Pointer: references are pulled from your current canvas selection, so anything you've generated or uploaded into the project can feed a reference-to-video clip.

## Models, aspect ratio and duration

Image-to-video models:
- **Kling v3 Standard** — balanced quality/speed. Output ratio follows the start image. Optional end frame, native audio.
- **Wan 2.6** — fast, good for previews and quick iteration. Single start frame only.
- **Veo 3.1 Lite** — Google's high-quality image-to-video. Single start frame only.
- **Seedance 2** — flagship, cinematic motion. Optional end frame, synced audio, best quality.

Reference-to-video model:
- **Seedance 2 Reference** — up to 9 reference images and 3 reference videos.

Choosing settings:
- **Aspect ratio** — honored by Veo 3.1 Lite, Seedance 2, and Seedance 2 Reference. Kling v3's output ratio instead follows the start image.
- **Duration** — 4–15 seconds, honored by Kling v3 Standard, Seedance 2, and Seedance 2 Reference.
- **Resolution** — 480p / 720p / 1080p, honored by Seedance 2 and Seedance 2 Reference (defaults to 720p for reference-to-video).
- **Audio** — off by default; output is silent unless you explicitly ask for audio. Supported on Kling v3 Standard, Seedance 2, and Seedance 2 Reference.

Just describe what you want (ratio, length, audio) in your message and the assistant maps it to the chosen model's supported settings.

## Confirmation before every generation (cost)

Video generation **always** requires explicit confirmation. Before any clip is generated the assistant ends its message with a confirmation that renders a one-click **Yes** button plus an estimated credit cost and a per-call breakdown. The cost estimate accounts for the clip's duration, resolution, and whether audio is on. Nothing is generated until you confirm — you can also just type "yes". Cost is shown in credits.

## Cinematic Prompt (Seedance) assistant skill

For purpose-built Seedance 2.0 prompts there is a **Cinematic Prompt — Seedance** skill. It reads your uploaded reference images, videos, and audio (wardrobe, identity, voice, environment, atmosphere), routes them into Seedance's deep reference stack (up to 9 images / 3 videos / 3 audio per generation), and composes a Seedance-ready prompt. It works in five cinema modes — **Narrative, Studio, Action, Performance, Atmospheric** — favoring rhythmic prose over photography jargon, one primary camera move per shot, a locked subject anchor across multi-shot sequences, and inline `Avoid X.` constraints.

Ask for cinematic or Seedance video prompting to use it. For other video models (Veo, Sora, Kling), there is a sibling **Cinematic Prompt** skill with the same five-mode grammar. See [The Agent (Skills Chat)](/docs/product/agent).

## Video library and playback

Finished clips are saved to your video library and placed on the canvas as video nodes you can play back. Intrinsic width/height are captured client-side (from a generated thumbnail, or back-filled by loading video metadata) so clips size correctly in the grid; the fallback ratio is 16:9.

Pointer: from the library/canvas you can reuse a clip as a reference (reference-to-video) or combine clips (merge, below).

## Merge clips into one video

You can stitch several clips end-to-end with the **Merge Videos** utility.

To merge:
1. On the canvas, select **2 to 5 video nodes** (no images mixed into the selection).
2. In the multi-selection side toolbar, click the **Merge** button (tooltip "Merge N videos").
3. In the **Merge Videos** dialog, reorder the clips (drag, or the up/down arrows) into the sequence you want and pick an output resolution: **Auto (match inputs)**, Landscape 16:9, Portrait 9:16, Square HD, Landscape 4:3, or Portrait 3:4.
4. Run the merge — the combined video is added back to your project.