Generating Videos

Turn your images and references into short video clips using the in-editor assistant, which animates a still frame (image-to-video) or composes motion from several reference images and videos (reference-to-video). Every video generation asks for explicit confirmation first, because video costs scale with length.

Animate an image (image-to-video)

The assistant's generate_image_to_video tool animates one image into a clip, with an optional end frame.

To animate an image:

Put the image on your canvas and select it so it becomes a reference for the assistant (it is numbered as @Image1, @Image2, …).
Ask the assistant to animate it and describe the motion you want (e.g. "make @Image1 slowly pan, camera drifting left").
Pick a model when you have a preference (see Models below). For a smooth start→end transition, also select a second image as the end frame — only supported on Kling v3 Standard and Seedance 2.
The assistant replies with a one-click confirmation showing the estimated credit cost. Generation only starts after you confirm (see Confirmation below).
The clip appears in the conversation when it finishes — typically 30–120s.

Pointer: the resulting video lands in your video library and on the canvas, where it can be selected as a reference for reference-to-video or merged with other clips.

Use references (reference-to-video)

The generate_reference_to_video tool composes a clip from up to 9 reference images and up to 3 reference videos plus a prompt (Seedance 2 Reference).

To use references:

Select the images and/or videos you want on the canvas. They become assistant references, numbered @Image1…@Image9 and @Video1…@Video3.
Address them in your prompt by their tokens — e.g. "use the wardrobe from @Image1 and the camera motion from @Video1".
Reference videos guide motion and style; reference images supply identity, wardrobe, environment, and atmosphere.
Confirm the cost preview to start (see Confirmation).

Pointer: references are pulled from your current canvas selection, so anything you've generated or uploaded into the project can feed a reference-to-video clip.

Models, aspect ratio and duration

Image-to-video models:

Kling v3 Standard — balanced quality/speed. Output ratio follows the start image. Optional end frame, native audio.
Wan 2.6 — fast, good for previews and quick iteration. Single start frame only.
Veo 3.1 Lite — Google's high-quality image-to-video. Single start frame only.
Seedance 2 — flagship, cinematic motion. Optional end frame, synced audio, best quality.

Reference-to-video model:

Seedance 2 Reference — up to 9 reference images and 3 reference videos.

Choosing settings:

Aspect ratio — honored by Veo 3.1 Lite, Seedance 2, and Seedance 2 Reference. Kling v3's output ratio instead follows the start image.
Duration — 4–15 seconds, honored by Kling v3 Standard, Seedance 2, and Seedance 2 Reference.
Resolution — 480p / 720p / 1080p, honored by Seedance 2 and Seedance 2 Reference (defaults to 720p for reference-to-video).
Audio — off by default; output is silent unless you explicitly ask for audio. Supported on Kling v3 Standard, Seedance 2, and Seedance 2 Reference.

Just describe what you want (ratio, length, audio) in your message and the assistant maps it to the chosen model's supported settings.

Confirmation before every generation (cost)

Video generation always requires explicit confirmation. Before any clip is generated the assistant ends its message with a confirmation that renders a one-click Yes button plus an estimated credit cost and a per-call breakdown. The cost estimate accounts for the clip's duration, resolution, and whether audio is on. Nothing is generated until you confirm — you can also just type "yes". Cost is shown in credits.

Cinematic Prompt (Seedance) assistant skill

For purpose-built Seedance 2.0 prompts there is a Cinematic Prompt — Seedance skill. It reads your uploaded reference images, videos, and audio (wardrobe, identity, voice, environment, atmosphere), routes them into Seedance's deep reference stack (up to 9 images / 3 videos / 3 audio per generation), and composes a Seedance-ready prompt. It works in five cinema modes — Narrative, Studio, Action, Performance, Atmospheric — favoring rhythmic prose over photography jargon, one primary camera move per shot, a locked subject anchor across multi-shot sequences, and inline Avoid X. constraints.

Ask for cinematic or Seedance video prompting to use it. For other video models (Veo, Sora, Kling), there is a sibling Cinematic Prompt skill with the same five-mode grammar. See The Agent (Skills Chat).

Video library and playback

Finished clips are saved to your video library and placed on the canvas as video nodes you can play back. Intrinsic width/height are captured client-side (from a generated thumbnail, or back-filled by loading video metadata) so clips size correctly in the grid; the fallback ratio is 16:9.

Pointer: from the library/canvas you can reuse a clip as a reference (reference-to-video) or combine clips (merge, below).

Merge clips into one video

You can stitch several clips end-to-end with the Merge Videos utility.

To merge:

On the canvas, select 2 to 5 video nodes (no images mixed into the selection).
In the multi-selection side toolbar, click the Merge button (tooltip "Merge N videos").
In the Merge Videos dialog, reorder the clips (drag, or the up/down arrows) into the sequence you want and pick an output resolution: Auto (match inputs), Landscape 16:9, Portrait 9:16, Square HD, Landscape 4:3, or Portrait 3:4.
Run the merge — the combined video is added back to your project.