Cinematic AI Music Video Prompts: A Director's Library
Cinematic AI music video prompts written like a director would: shot, framing, light, lens, motion, mood. 2026 tested across the frontier video models.
Kevin Gabeci
The difference between a music video that feels cinematic and one that feels like generated content is almost never the model. It’s the prompt. Specifically, it’s whether the prompt was written by someone thinking like a director or someone thinking like a search query.
A director thinks in shots. They name the lens, the framing, the lighting setup, the blocking, the motion of the camera. They know that a 50mm portrait at f/1.8 in low key light feels different from a 24mm wide at f/8 in flat overcast, and they specify which one. The video models in 2026 have absorbed enough cinematography from their training data that they respond to that vocabulary the way an actual cinematographer would. Use the words and you get the look.
This piece is a library of cinematic AI music video prompts written that way. Each one names the shot type, the framing, the lighting, the lens (where it matters), the camera move, and the mood. They are organized into four mood blocks: tension, longing, triumph, and elegy. Pick the one that fits the section of your song you’re scoring and adapt it to your specific world.
A note on adaptation. These prompts are not meant to be used verbatim. They are meant to be raided. Take the shot type and the lighting from one, swap the subject from your own video, and you have a prompt that fits your project and that the model will execute well. The grammar matters more than the specific nouns. If you want a deeper dive into how this grammar works, the prompt-writing fundamentals piece breaks it down. If you want a wider library across genres, the 100-prompt master library is the place to start. If you’re choosing between image models for your cinematic work, Midjourney vs Flux for music video covers the trade-offs.
Tension
Shots for the buildup. Tight framings, restricted motion, anything that wants to coil before it springs.
Close up on a woman's eyes in low key light, single hard source from camera right, 85mm lens at f/1.4, shallow depth, faint smoke drifting through the beam. Camera move: slow push in, 4 seconds. Mood: held breath.
Medium shot, a man at a kitchen table, single overhead bulb, kitchen otherwise dark, 50mm, his hands resting on a closed envelope. Camera move: imperceptible dolly in, 5 seconds. Mood: a decision he hasn't made.
Wide shot, an empty parking garage at night, fluorescent banks flickering, 24mm, two figures at the far end too small to read. Camera move: locked off, 6 seconds. Mood: surveillance.
Over the shoulder, a young woman watching a phone screen go dark, 50mm, screen glow fading off her face into ambient blue. Camera move: static, 4 seconds. Mood: the message that didn't arrive.
Insert shot, a hand turning a key in a deadbolt, 35mm macro, single bare bulb hallway behind, faint dust in the light. Camera move: locked off, 3 seconds. Mood: irreversible.
Low angle on a figure walking toward camera in a long corridor, 35mm, cool fluorescents above, footsteps in puddles on the floor. Camera move: slow dolly back, matching their pace, 6 seconds. Mood: approach.
Longing
Shots for the sad sections, the verses that don’t resolve, the pieces that ache.
Wide static shot, a woman alone at a window, late afternoon side light through gauze curtains, 35mm at f/4, the room behind her in shadow. Camera move: static, 6 seconds. Mood: waiting that has stopped expecting.
Medium shot from behind, a man on a balcony at golden hour, city out of focus beyond, 85mm at f/1.8, his shoulders relaxed. Camera move: slow tilt up to the city, 5 seconds. Mood: missing someone who is also somewhere down there.
Close up, two hands almost touching across a café table, 50mm at f/2, warm overhead pendant light, condensation on glasses between them. Camera move: rack focus from the hands to the glass, 4 seconds. Mood: distance that is only six inches.
Wide shot, a single figure on an empty beach at dusk, 24mm, blue hour sky, footprints leading to where they stand. Camera move: slow pull back, 7 seconds. Mood: smaller and smaller against the sky.
Medium close up, a woman in a passenger seat looking out the window, 50mm, soft window light moving across her face as the car moves, rain streaks. Camera move: locked to her, the world moves behind, 6 seconds. Mood: passenger to her own life.
Top down shot, two pillows on an unmade bed, morning light from the left, 35mm, one pillow indented, the other untouched. Camera move: slow dolly forward over the bed, 5 seconds. Mood: the side that did not get slept in.
Wide shot, a child's empty swing moving slightly in the wind in an autumn yard, 50mm, low golden light through bare trees. Camera move: locked off, 6 seconds. Mood: kept moving.
Triumph
Shots for the choruses, the lifts, the moments where the song claims something.
Low angle, a runner cresting a hill at sunrise, 35mm, sun directly behind them flaring through the lens, sweat on the skin. Camera move: dolly back as they crest, 5 seconds. Mood: arrival.
Wide shot, a woman in a stadium tunnel walking toward the light, crowd noise muffled, 24mm, silhouetted, her shoulders squared. Camera move: slow push in behind her, 6 seconds. Mood: about to be loud.
Medium shot, hands raising a vinyl record from a sleeve, 50mm, late afternoon window light, dust visible in the beam, the record catching the sun. Camera move: tilt up to a face that is smiling at it, 4 seconds. Mood: held what I wanted.
Aerial shot, a single car driving across a desert highway at golden hour, 35mm equivalent on drone, long shadow trailing, mountains in distance. Camera move: drone push forward and up, 6 seconds. Mood: the road went somewhere after all.
Close up, a guitarist's hand sliding into a power chord, 50mm at f/2, single rim light from behind, sweat on the strings. Camera move: rack focus from hand to face, 4 seconds. Mood: the chord that earns the song.
Wide shot, a crowd's hands raised in a club, lasers cutting through smoke above them, 24mm, magenta and cyan, low angle from the floor. Camera move: slow dolly back through the crowd, 6 seconds. Mood: room found the same beat.
Medium shot, a woman painting the last brushstroke of a large canvas, 35mm, north-facing studio light, paint-stained smock, she steps back. Camera move: pull back as she steps back, 5 seconds. Mood: it is finished.
Elegy
Shots for the bridges and outros that grieve. Slower, quieter, lower contrast.
Wide static shot, an empty church at dawn, 24mm, light coming through stained glass and falling on dust in the air, pews empty. Camera move: locked off, 8 seconds. Mood: someone is no longer here.
Medium shot, a man placing flowers on a gravestone, 50mm, overcast diffused light, autumn leaves on the ground, he does not speak. Camera move: slow circle around him, 6 seconds. Mood: quiet visit.
Close up, an old letter being unfolded by aging hands, 50mm macro, single window source, paper softer than new paper. Camera move: gentle push in to the handwriting, 5 seconds. Mood: voice from before.
Wide shot, a woman walking away from camera down a country road, autumn trees on both sides, 85mm compressing the road, late afternoon side light. Camera move: locked off, she gets smaller, 8 seconds. Mood: leaving for a reason she has accepted.
Medium shot, a child's hand letting go of a balloon, 35mm, blue overcast sky, the balloon already moving up, the hand falling. Camera move: tilt up to follow the balloon, 5 seconds. Mood: release.
Insert shot, a wedding ring being set on a nightstand, 50mm macro, warm bedside lamp, the ring catching one point of light. Camera move: locked off, 4 seconds. Mood: chosen.
Wide shot, a single figure standing at the edge of an ocean cliff at dusk, 35mm, blue hour sky, the wind moving their coat. Camera move: slow pull back to reveal more sky, 7 seconds. Mood: end of something that needed to end.
How to adapt these
Treat each prompt as a template, not a script. The structure that makes them work is the same across all of them: shot type, framing, light source, lens choice, motion direction, mood. Keep that structure, swap the nouns for the world your song lives in.
If your song is about a city, port the lens and lighting from a longing prompt and replace the woman at the window with a man on a fire escape. If your song is about loss, port the elegy structure but trade the church for an empty kitchen. The grammar holds. The model executes.
A few practical notes. First, when in doubt, lower the camera motion. Locked off shots and slow dollies look more cinematic in 2026 than dramatic moves do, because the models still struggle with rapid camera changes and the artifacts read as cheap. Second, name your light source explicitly. Single source, hard or soft, direction relative to camera. Models that don’t get a light direction default to flat overhead, which is the most boring lighting in cinema. Third, specify time of day. Dusk, golden hour, blue hour, dawn, midday, late afternoon. These words trigger entire color palettes that hold a video together.
Build your library, raid it, ship.
If you want to actually make the video and not just write the prompts, open Melodex and feed your favorite prompts into the scene-by-scene flow. The grammar above is what the model is listening for.
Frequently asked questions
- Why write prompts in directing language?
- Because video models trained in 2025 and 2026 respond better to specific cinematographic terms. Saying 50mm, low key, slow push in gives the model a target. Saying nice shot of a person gives it nothing.
- Do these prompts work in Runway, Pika, and Luma?
- Yes. They are written to be portable. Each model has minor quirks (Pika prefers shorter prompts, Luma loves explicit camera moves) but the core grammar transfers.
- How long should the resulting clip be?
- Five to eight seconds is the sweet spot for cinematic clips in 2026. Anything longer and the model starts to drift from the prompt.
- Can I chain these into a full video?
- Yes, that is exactly the workflow. Pick six to eight prompts that share a world (palette, lens, mood) and stitch them together with cuts on beats.
- What lens choices matter most?
- 35mm for intimacy, 50mm for portraiture, 85mm for compression and longing, 24mm for environmental scope. Naming the lens shapes the depth and feel.
- Should I specify film stock?
- If you want a particular grain. Kodak 500T pushes warm, Fuji 400H pushes cool, 16mm reads handheld and grainy. The model picks up on these cues.
- How do I keep continuity across clips?
- Repeat the world sentence in every prompt. Same palette, same time of day, same lens family. Lock those three and the model holds the line.
Keep reading

How to Export Suno Stems to Ableton, Logic, FL Studio
Walkthrough for exporting up to 12 WAV stems and MIDI from Suno Studio into your DAW, with project setup and routing.

How to Make AI Music Sound Less Robotic
Ten techniques to humanize AI-generated music. Variations, layering, DAW tricks, and prompt patterns that pull the AI sheen off your track.