PixAI's Image-to-Video: Usage Guide and Prompt Writing Tips

PixAI’s Image-to-Video (i2v) feature lets you breathe life into static images with powerful animation tools. Whether you’re new to i2v or a returning user, you’ll discover exciting upgrades: a streamlined interface and new models with enhanced capabilities. This guide explores the updated workflow, compares model strengths, and shares pro techniques for writing effective prompts.

Let’s explore the new PixAI i2v experience!

Navigating the Interface

Once you enter the Image-to-Video workspace, you'll see a clean layout designed to make your animation process smooth and intuitive.

Left Panel: Import task from your generation history or click "Upload Image" to add your source image.

Right Panel: Core generation controls:

Model Selection: Choose from 3 specialized models (detailed below).
Mode:
- Professional: Higher quality (recommended).
- Basic: Faster processing.
Duration: You can choose between a 5-second or 10-second animation. While longer videos can look impressive, they’re also more challenging for the model to render cleanly. If you're testing ideas or refining a prompt, 5 seconds is often the best place to start.
Prompt Box: This is where the magic happens — the prompt box is where you write your scene description. A good prompt includes subject, movement, and environment. We’ll break this down in detail later.

💡 Tip: Use AI Smart Prompt (Optional)

In the upper right corner of the prompt box, there’s a small toggle called AI Smart Prompt. This tool is designed to help enhance your prompt for better results.

Camera Controls: Dropdown menu for dynamic moves like zoom, pan, or spin. Not all models support this feature(v2.5❌), but when available, it cues add a cinematic layer to your animation.

Advanced Settings:
Open the Advanced section if you want to refine your outputs. Here you can add negative prompts like "blur," "distorted," "low quality," or "abstract" to help the model avoid common artifacts.
Action Presets: Quick shortcuts for motions like "Kiss" or "Hug". They’re designed to combine two separate images into a single animated interaction.

Model Lineup Overview

v2.7 (High Dynamics) - Cinematic Pro

Our flagship cinematic model featuring:

Advanced camera movement simulation
Supports camera movement even with simple prompts such as "live 2D", enabling dynamic results from simple input.
Movie-quality motion blur and depth effects

v2.7 is designed to deliver a visually dramatic, movie-like experience. It features advanced camera motion, layered depth, motion blur, and complex scene handling, making it ideal for storytelling with dynamic composition and movement.

v2.6 (High Resolution) - Balanced Performance

v2.6 strikes a solid balance between the high flexibility of v2.5 and the cinematic intensity of v2.7. It’s a stable and dependable model that noticeably improves on v2.5’s clarity and smoothness, while avoiding the visual instability and overambitious camera work that sometimes affect v2.7. You might not get unexpected "wow" moments across multiple generations—but if you’re aiming for consistent, clean results with minimal artifacts, v2.6 is a highly recommended choice.

v2.5(High Flexibility) - Realistic Motion

v2.5 is your go-to model when flexibility and style diversity matter most. It handles a wide range of visual aesthetics and is especially good at capturing subtle motion, rich facial expressions, and physically plausible interactions across different prompts. While it may need a few more generations to perfect complex movements like dancing or running, its ability to follow detailed prompts and maintain realistic lighting and texture makes it a powerful tool for expressive, customized video creation.

Capable of both subtle and dynamic character motions
Rich facial expressions and micro-movements
Maintains believable lighting, shadows, and physical interactions across styles

Unique Feature: Only v2.5 supports video LoRAs for specialized motions, dance, live 2d and shake. We will drop more video loras in the future and open up the ability for you to train your own video lora in the future.

Feature	v2.7	v2.6	v2.5
Motion Quality	✅ High-energy, smooth cinematic motion	⚠️ Subtle motions are good, large actions need retries	✅ Handles subtle & dynamic motion
Camera Movement	✅ Cinematic camera simulation	➖ Basic camera work unless carefully prompted	➖ Basic camera work unless carefully prompted
Prompt Adherence	⚠️ May sacrifice prompt for visuals	⚠️ Decent but requires precise wording	✅ Follows detailed prompts well
Scene Composition & Consistency	✅ Consistent style across creative settings	✅ Stable composition	⚠️ Style drift and abrupt cuts
Style Flexibility	✅ High stability with different styles	⚠️ May struggle with ultra-stylized prompts	✅ Excels in stylized, artistic prompts
Start&End Frame Supported	❌No	✅Yes	✅Yes
Overall Strength	🔵 Cinematic storytellingAction & sports scenes	🟢 Ideal for clean, reliable I2V	🟣Subtle + dynamic motion
Recommendation	✔️ Best for cinematic and dynamic motion lovers	✔️ Best for general usage with stable quality	✔️ Best for diverse and artistic exploration

Mastering Prompt Writing

Now for the most critical part - writing prompts that actually work. Instead of just giving you examples, we'll explain the why behind each technique so you can create your own effective prompts.

A great prompt usually follows this structure:

Prompt = Subject + Motion + Environment

Let’s break down each layer starting with your anchor point:

Layer 1: Subject Definition

Since you’re working with image-to-video, the subject already exists visually — so you don’t need overly detailed character descriptions. But it’s still important to include a general description of your subject in the prompt. Why? Because this helps the model lock in visual anchors like hair, outfit, or facial features—especially for maintaining consistency during motion.

Example: "A white-haired girl with cat ears and violet eyes"

Each descriptor gives the AI specific features to track and maintain throughout the animation.

Layer 2: Motion Specification

This is the soul of your animation. You’re telling the model what the subject is doing, so you need to be specific and intentional. Always tie the motion back to the subject. Why does this matter? Because it gives the model a clear action to animate—and how to animate it.

✅ Good example:

"The white-haired girl gently adjusts her bangs with one hand, tilting her head slightly"

Avoid vague motion like:

“She moves around”

Instead, write:

"She slowly leans forward to pet the cat on her lap, her expression softening as the cat purrs."

Tips:

Use verbs that imply motion and style (e.g., “leaps gracefully,” “glances quickly,” “twirls with hesitation”)
Avoid abstract terms like “moves” or “interacts” on their own
Combine physical action with emotional nuance or timing

Layer 3: Environmental Context

The model also needs to know where the subject is — this helps it apply the correct lighting, reflections, atmosphere, and even physics (like wind or particles).

✅ Good example:

"Sitting on a wooden classroom desk bathed in afternoon sunlight, dust particles dancing in the golden light"

Or:

"Standing beside a misty lake at dawn, soft light reflecting on the water’s surface"

This helps the model simulate lighting, atmosphere, and background interactions.

Layer 4: Camera Movement(Non-compulsory)

Want a cinematic effect? Add camera control. But be realistic—many models only support basic movement while advanced movement is better supported by v2.7.

Prompt = Camera movement + Subject + Motion + Environment + Camera language

Camera prompts tell the model how to frame and move through the scene. When writing these, think like a director - describe how you want the camera to physically navigate the space, whether that's gliding forward, tilting up, or panning across. Keep timing in mind, avoid overly complex choreography so the model can execute it cleanly. Most importantly, place your camera command right where the movement happens in your scene description - for example: 'Camera slowly pushes in through the crowd toward the girl, transitioning into an over-the-shoulder shot as she gazes up at the departures board.' - so the model understands precisely when and how to execute the move without getting confused.

CAMERA MOVEMENT ARSENAL

Movement Type	Prompt Syntax	Best Use Case
Push In	"camera slowly pushes in from [wide/medium] to [medium/close-up]"	Emotional reveals
Pull Back	"camera pulls back to reveal [context/environment]"	Context establishment
Pan Left/Right	"camera pans smoothly from left to right across the scene"	Landscape reveals
Tilt Up/Down	"camera tilts up from [feet/ground] to [face/sky]"	Character introduction
Orbit	"camera orbits around the subject in a [clockwise/counter-clockwise] motion"	Dynamic character showcase
Track	"camera tracks alongside as [subject] moves [direction]"	Following action
Crane	"camera cranes up from ground level to bird's eye view"	Dramatic scale change
Dolly	"smooth dolly shot moving [forward/backward] while maintaining focus"	Cinematic approach

Tips:

Match the Image with the Prompt

Match angles in your source image to your camera request—if you prompt a "low-angle shot," ensure your uploaded image isn’t a top-down view.

Timing is Everything

Specify the pace of camera movement with words like "slowly," "gradually," "quickly," or "smoothly." This helps the AI understand the emotional tone you want to convey through camera motion.

Avoid Conflicting Movements

Don't combine opposing camera actions like "zoom in while pulling back" unless you specifically want a disorienting effect. Keep movements logical and purposeful.

Your static images are waiting to tell their stories. With the new PixAI I2V system, you have everything you need to make them speak, move, and captivate your audience.

Thanks for exploring the updated i2v system with us. We can't wait to see the incredible animations you'll create!

How to Make an Animated Image

General Guide of Writing Prompts

Introducing Tsubaki: The Revolutionary AI Model That Actually Understands Your Vision | Complete Guide