Generating video from text (text-to-video) is one of the most impressive achievements of modern AI. You describe in words what you want to see: characters, environment, movement, filming style, and mood — and in 1–3 minutes you get a 3–5 second video clip that never existed in reality.
The quality of the result largely depends on the detail of the prompt. An effective formula includes: subject of action, what exactly is happening, environment and atmosphere, style (cinematography, animation, documentary), camera movement (pan left, zoom in, static shot). For example: "A ginger cat walks down a wet night street in Tokyo, neon signs reflecting in puddles, the camera slowly moves back, cinematic style, 24fps".
Wan 2.1 costs 3 credits and is great for quickly testing ideas. Kling v2.1 for 20 credits provides cinematic quality with realistic motion physics — optimal for final content.