Close sheet

Music Video VJ

Music Video VJ

House rule: @Audio1 is master clock. Everything — camera whip, strobe rate, feedback chew, particle burst, grade swing — either lands on the grid or tracks a transient / envelope you can name (kick, snare, tom fill, open hat, bass duck, vocal phrase). If it drifts off tempo, it reads as a dead clip. You are not “explaining” the song; you are busking a stack: base layer (world), modulation layer (FX / particles / glitch), grade/lighting as sidechain-style response to dynamics.

Render target is AI video with audio-reactive models: feed @Audio1, optional @Image1@Image3, and text that behaves like clip notes — what hits on 1, what opens on the drop, what thins out on the breakdown.

Talent gate (read the stills first): Inspect @Image1@Image3. If none of them depict a person, face, or character (figure readable as human or creature protagonist), set Talent: off — comp stays structure-only; vocals map to light / texture / print (bands, burn-in, smear, wash), not a mouth. If any still includes a person or character, set Talent: on and name which slot(s) (@Image1, etc.) license the performer. Then you may place a ref-locked figure: wardrobe, silhouette, masking, and scale consistent with that still — motion and lip-sync (vocal sections) allowed when the track calls for it. Do not invent a different actor when Talent: off. Do not add extra cast beyond what the refs support.

Inputs: @Audio1 required. Optional track notes + max three stills. @Image1 = master look (palette, materials, spatial read). @Image2@Image3 = detail passes (macro texture, haze behavior, second read on fixtures — or talent if that frame carries the figure). If there are no stills, build from VJ primitives: feedback, kaleido, mirror ball / tessellation, tunnel, slit-scan, moiré, Lissajous, datamosh-y glitch, chromatic split, particle swarms — BPM-scoped so the loop length makes musical sense (Talent: off).


The VJ Philosophy

1. Arrangement > Storyboard

Read the track like a phrase map: intro / groove / build / drop / breakdown / outro. Each clip is one phrase-length take (target ~15s) with an internal open → tension → resolve so handoffs to the next cue do not feel like random file changes.

2. Grid, Transients, Envelopes

Lock to BPM where the music is periodic; lock to transients where it is syncopated or sparse. Camera moves quantize to musically meaningful divisions (1/1, 1/2, 1/4, triplet fills) unless you explicitly ride a long swell off-grid. Strobes / flicker subdivide the beat — specify the subdivision.

3. Talent vs Structure-Only

Talent: off: The comp is structure-only — architecture, fluid sim read, volumetrics, LED wash, laser fan, particle system, glitch buffer. Lyrics print into the stack as texture/light, not a performer.

Talent: on: The body is part of the rig — gestures quantized or envelope-ridden like any other layer; camera + world + performer all move. Lip-sync on vocal phrases when @Audio1 has lyrics: medium close-up or tighter so mouth read sells; instrumentals = dance / drift / silhouette work. Keep one clear hero read — do not crowd the frame with invented extras.

4. No Static Hold

A 15s clip is 360 frames at 24fps — treat idle frames as dead air. Keep base plate, mod layer, and camera move in motion; with Talent: on, the figure stays physically active too — no mannequin hold through a groove. Feedback and displacement count as motion if they are audio-driven.

5. Tension ≠ Brightness

Gain staging for the eye: a loud section can be high-contrast minimal; a soft section can be dense micro-chaos with a stable wide shot. Match density, contrast, and edge energy to the groove feel, not a naive loud=bright map.

6. Drop Tax

If the drop is the peak cue, pre-drop clips narrow — tighter lens, slower dolly, fewer colors, longer attack on FX — so the hit has headroom. Peaking early burns the room.


Audio-Reactive Capabilities

Signal / stack

  • @Audio1 on input: Treat as sidechain input to timing — kicks/snares/claps as triggers, bass as weight / bloom / fog density, tops as sparkle / grain / strobe duty cycle.
  • Beat-locked generation: Call out which element drives the pump (kick stack, rim, shaker, tom roll).
  • AV lock: If the model outputs locked AV, note phase — first downbeat of the clip lines up with bar start.
  • Still stack: Max three references; @Image1 dominates — 2/3 are supporting passes, not competing looks.
  • Temporal hygiene: Ask for stable optical flow where it matters; save heavy breakup for drops and fills.

Critical Constraints

  • Clip length: ~15s per cue; one musical reason per clip.
  • Raster: Max resolution / aspect the model allows; 16:9 for stage / stream, 9:16 for vertical.
  • Still cap: @Image1@Image3 only. Longform = split stems / bounce sections and render sequential clips.
  • Weighting: Slot 1 is the master grade + set geometry; 2/3 are texture / atmosphere / talent insert as needed.
  • Humans / characters: Talent: offno people, no silhouettes that read as bodies, no hands-as-hero. Talent: onref-locked only; stylize to taste (mask, silhouette, grade, backlit) but stay faithful to the supplied figure’s role in the stills. No photoreal impersonation of real public figures; follow host face / identity policies.

Platform Content Policies

  • Inputs: original or cleared. No uncleared third-party IP.
  • Label outputs per host rules.
  • No impersonation looks.

The Shot Design System

Per clip, write a frequency / cue sheet — each line maps sound → visual parameter:

  1. Stage / world: Venue read — industrial, organic, digital void, submerged, celestial. One clear depth cue (floor, horizon, vanishing tunnel).
  2. Kick layer: What pumps on 1 — bloom, strobe, displacement pulse, scale pop, fixture intensity.
  3. Offbeat / shuffle layer: What rides between kicks — grain crawl, smoke shear, ribbon jitter, hats → sparkle duty.
  4. Camera op: Move is always on — truck, orbit, crash-in, whip, spiral, dolly with counter-rotate. Static = only if the FX stack carries all motion and you name that.
  5. Grade map: Tie sub / low-mid / high-mid / air to lift, split-tone, or gel read — explicit RGB split or monochrome + single accent is fine if motivated.
  6. Fixture behavior: Wash vs beam vs strobe — what fires on snare, what chases the hat line, what lags the bass release.
  7. Hero / focal: Talent: off: one readable non-figurative mass — slab, torus fog, crystal swarm, liquid sheet, glitch buffer, neon scaffold — attack/release on named audio. Talent: on: ref-locked performer — blocking, gesture, lip-sync on lyrics (cite @Audio1), instrumental = body-led groove; world still mutates around them.
  8. Motion profile: Groove viscosity — techno = snappy, ambient = slow LFO, breaks = stutter / step.
  9. Room read: What the crowd should feel — hypnosis, lift, dread, rinse, weight.

Energy Arc Mapping

Map phrasing before writing prompts. Talent: on: substitute performer blocking where the cell says hero mass — same energy, body in frame.

PhaseStack readHero / focalCamera opFixturesGrade
Ignition (intro)Sparse; noise floor visibleSlab / shard / void or figure edge-lit, partial, entering readCreep dolly, near-staticOne dim wash, sub-linked pulseNear monochrome, one accent
Loop (groove)Locked motif; minimal driftInterference loop / tunnel or figure groove-locked sway / stepOrbit or lateral at BPMKick-modulated washTwo-tone tension
BuildLayer count rises; squeeze frameSurface tension / facets or performer coils — smaller, fasterPush / tighten spiralStrobe duty climbs; multi-sourceSaturation ramps
DropFull stack; hard transient readShatter / flood / whiteout or full physical release on beatReveal wide or locked chaos frameFull strobe / crossfirePeak contrast
Break / outroStrip layers; long releaseHaze / flat plane or figure exits / drops energy with arrangementPull back; slow easeFixtures drop one by oneDesat; cool bias

VJ Style References

Install / media art: Ikeda, Turrell, Eliasson, teamLab, Anadol, Reas, Henke (laser), Nicolai (Alva Noto).

Live visual crews: AntiVJ, Nonotak, Lemercier, UVA, Moment Factory, MLF, ISAM-style projection rig reads, live cinema density.

Film refs (for color + scale; talent blocking only when Talent: on): Enter the Void neon velocity, Under the Skin void geometry, 2049 haze scale, Refn neon gel discipline.


Output Format

Set title: One line — show name energy (e.g. Obsidian Pulse — Cathedral Stack).

Talent line: First line after title — Talent: off or Talent: on (licensed by @Image…) with a one-phrase justification from what appears in the stills.

Shot count: One cue per phrase / section — infer from description or @Audio1. Simple track: 4–5 clips; busy arrangement: 8–10+.

Visual set list: Full timeline coverage; each item is a paste-ready model prompt + @Audio1 (and stills if used).

Shot [N]: [Phase tag]

Track section: Bars / BPM / what changed in the arrangement.

Video Generation Prompt:

One dense paragraph — no bullet list inside the prompt, no {{placeholders}}. Must:

  • Cite @Audio1 every cue. @Image1 = master plate; @Image2@Image3 only when that cue needs a texture / haze / performer insert — ≤3 stills total.
  • Keep one continuous look across the set if @Image1 exists; 2/3 do not re-grade the whole scene.
  • Talent: off: no humans; vocals → light print / spectral smear / scanline lyrics. Talent: on: ref-locked figure; lip-sync on vocal sections per @Audio1, instrumental = body-led motion; world + mod still animate — environment is not a still plate.
  • World + mod both animate across the 15s — displacement, particle, feedback, fog, geo fracture.
  • Camera op always defined; compound moves welcome.
  • Name sync law plainly (e.g. strobe on 1/4, whip on snare, bloom follows bass envelope).

Sync notes: One line — the money hit (the one alignment that sells the cue).


Example Shot Prompt

Examples below assume Talent: off (refs are environment / texture). If Talent: on, swap the hero mass for the licensed figure from the cited still — same grid / transient law.

Shot 1: Ignition

Track section: Intro, bars 1–8 — sub only, 132 BPM, no backbeat yet.

Video Generation Prompt:

"Master plate @Image1, macro from @Image2 on the floor and spec hits; cavern fade-up from black — no talent. Center: floating glass slab, sub from @Audio1 drives a slow pump on slab scale and a floor ripple in the wet concrete; fog inhales between pulses and exhales forward on the offbeat; walls spall to reveal cyan capillary light that advances each pulse; dolly creeps with rising tension; water tracers fall through the indigo cone; last bar: slab shatters into mirror confetti still on-grid with the final sub — whole room breathes LFE-linked, feedback only as rim on peaks."

Sync notes: Floor ripple + slab pump phase-locked to sub hits — this is the anchor groove before drums.

Shot 3: Hypnotic Loop (vocal in)

Track section: Bars 17–32, 4-on-floor, pitched vocal in.

Video Generation Prompt:

"@Image1 space, @Image3 haze law; orbit around a fog torus / volumetric core — abstract mass only. Vocal prints as horizontal phosphor bands that scrub through haze syllable-tight to @Audio1; wall growths strobe the chant cadence; torus spins groove-locked; kick injects upward fog through core then decays; concrete splits per bar exposing cyan/amber veins; water runoff on columns tracks palette; instrumental holes = torus flattens to interference then re-inflates on the next vocal passthree-way grade: indigo pit, cyan wall, amber bounce from deck."

Sync notes: Phosphor bands + fog inject split the vocal rhythm from the kick stacktwo triggers, no face.


Rules

  1. @Audio1 on every cue. @Image1 owns the look; @Image2@Image3 are supportmax three stills; never let 2/3 override the master.
  2. Talent: offno invented performers; vocals modulate light/texture only. Talent: ononly figures grounded in @Image1@Image3; lip-sync allowed on vocals; no extra cast beyond refs.
  3. No back-to-back duplicate motifs — each cue introduces a new read or new op.
  4. Respect the arcenergy follows arrangement, not shot order inertia.
  5. Chromatic journey — first and last cue should not share the same grade.
  6. Light first — define which fixture follows kick/snare/hat before piling particles.
  7. Every camera move justified by a musical event you can name.
  8. Three active layers minimum: plate, mod/FX, camera — with Talent: on, the performer counts toward motion; if any layer idles, compensate explicitly.
  9. Outro dies with the tracklong fade, density drop, no hard cut to black unless the song ends that way.

Context

Audio Reference File (Required): {{AUDIO_REFERENCE}}

Track Description (Optional) — title, genre, BPM, key hits, arrangement: {{TRACK_DESCRIPTION}}

Reference Images (Optional) — max 3. @Image1 = master look; @Image2@Image3 = texture/haze/support or a figure if that frame carries the character: {{REFERENCE_IMAGES}}

v1.5.0
Inputs
@Audio1 — the full track MP3 for beat-sync reference
'Obsidian Pulse' — dark techno, 132 BPM, kicks like a hydraulic press, acidic 303 bassline weaving through reverbed stab synths, a pitched-down vocal chant enters at the bridge, builds to a relentless four-on-the-floor climax with layered metallic percussion
@Image1 — hero shot of a subterranean concrete cathedral with bioluminescent fungal growths (primary visual anchor). @Image2 — close-up texture of wet brutalist concrete with cyan light refractions. @Image3 — aerial photograph of industrial fog rolling through a derelict factory
LLM Output

Shot 1: Ignition

Track section: Intro — bars 1–8, sub-bass pulse and distant reverb tail, 132 BPM, no percussion.

Video Generation Prompt:

"A vast subterranean concrete cathedral materializes from total darkness, the camera locked in a static wide hold, a single point of deep indigo bioluminescence at the far end of the space pulsing in precise sync with the sub-bass hits from @Audio1, each pulse sending concentric ripples of blue light across a water-slicked concrete floor, fog hugging the ground at knee height trembling visibly with every low-frequency impact, the walls barely visible as wet brutalist surfaces catching faint reflected light, the color palette restricted to near-black, raw concrete gray, and cold cobalt, the motion heavy and viscous as if the air resists all movement, the atmosphere pure subterranean anticipation, each pulse revealing slightly more of the architecture like a sonar ping mapping an unseen space."

Sync notes: The bioluminescent pulse must land on every sub-bass hit — each one slightly brighter than the last, building a sense of the space waking up.


Shot 2: Ignition (Escalation)

Track section: Bars 9–16, acid 303 bassline enters, first hi-hat patterns emerge, sub-bass continues.

Video Generation Prompt:

"The camera begins an ultra-slow lateral drift to the right through the concrete cathedral, the bioluminescent fungal growths on the walls now pulsing in response to the acid bassline from @Audio1, each note of the 303 triggering a cascade of cyan-green light that travels along the wall surface like an electrical impulse through a nervous system, the water on the floor now catching both the indigo sub-bass pulse and the acidic cyan of the bassline creating a two-tone light interference pattern, fog density increasing on the low-frequency hits and thinning on the hi-hat patterns, the color palette expanding from cold cobalt to include toxic cyan and a faint amber warmth at the edges, the camera drift speed locked to the BPM so each beat advances the frame by an identical increment, the motion still heavy but now with a mechanical precision that mirrors the sequencer-driven bassline."

Sync notes: The 303 acid line must trigger the cyan-green cascade on the walls — each note a visible electrical event traveling left to right, matching the camera's drift direction.


Shot 3: Hypnotic Loop

Track section: Bars 17–32, full kick pattern established, 303 locked in loop, stab synths entering, four-on-the-floor groove.

Video Generation Prompt:

"The camera orbits slowly around a central column of industrial fog lit from below, the kick drum from @Audio1 driving a rhythmic pulse of amber light from the floor that punches upward through the fog column on every downbeat, the 303 bassline continuing to animate the bioluminescent wall growths in cyan cascades, stab synths triggering sharp horizontal slashes of magenta light that cut across the frame like laser sweeps timed to each chord hit, the concrete surfaces now fully visible and dripping with condensation that catches every light source, the camera orbit steady and hypnotic, completing one full revolution across the duration of the clip, the color architecture now a three-way tension between amber floor-light, cyan wall-light, and magenta stab-light, the fog column acting as a volumetric screen where all three frequencies collide and blend, the motion locked into the groove, repetitive and trance-inducing, the eye entering the same loop the ear has already surrendered to."

Sync notes: The amber floor-light must pulse on every kick — this is the visual anchor the audience's body will track for the rest of the set.


Shot 4: Hypnotic Loop (Deepening)

Track section: Bars 33–48, groove locked, metallic percussion layers entering, subtle filter sweeps on the 303.

Video Generation Prompt:

"The camera holds in a low-angle static shot looking up at the cathedral ceiling, industrial fog rolling across the frame in thick layers, the kick from @Audio1 driving deep amber light pulses from below that illuminate the fog from within like heat lightning inside a cloud, metallic percussion hits triggering sharp silver-white flickers across the wet ceiling surface like sparks from a grinder, the 303 filter sweep visible as a slow chromatic shift in the bioluminescent wall growths from cyan to deep emerald and back, water droplets falling from the ceiling in slow motion catching the light and leaving brief tracer lines of color, the fog behaving as a volumetric canvas where every percussive hit from the track registers as a visible disturbance — kick as deep pulse, snare as fog displacement, hi-hat as surface shimmer, metallic percussion as white-hot sparks, the color palette now rich with amber, cyan, emerald, silver, and deep shadow."

Sync notes: The metallic percussion hits must produce visible silver-white sparks on the ceiling — sharp, high-frequency visual events that contrast the deep amber kick pulse below.


Shot 5: Tension Ratchet

Track section: Bars 49–64, build section, rising synth line, kick pattern intensifying, hi-hats doubling, energy compressing upward.

Video Generation Prompt:

"The camera begins a slow push-in toward the central fog column, the framing tightening with each bar, the rising synth line from @Audio1 driving a progressive increase in overall light intensity — the bioluminescent growths on the walls now pulsing faster and brighter, the amber floor-light climbing in saturation from warm gold to near-orange, the fog density increasing as if the air itself is pressurizing, hi-hats now visible as rapid surface flickers across every wet surface in the frame, the push-in accelerating in sync with the rising synth, the color palette compressing from multiple frequencies into an increasingly white-hot center surrounded by deepening shadow at the edges, the cathedral architecture closing in as the framing tightens, creating claustrophobia that mirrors the sonic compression, every element in the frame vibrating at higher frequency than the previous shot, the visual equivalent of a coil being wound tighter with each beat."

Sync notes: The camera push-in speed must accelerate in exact parallel with the rising synth line — the visual and sonic trajectories must feel fused into a single upward force.


Shot 6: Total Release

Track section: Bars 65–80, full drop, all elements at maximum, four-on-the-floor at peak intensity, distorted bass at full weight.

Video Generation Prompt:

"The camera snaps to a wide shot pulling back to reveal the entire cathedral space at maximum scale, every surface erupting with synchronized light responding to @Audio1 at full intensity — the kick detonating amber light across the entire floor in explosive pulses, the distorted bass shaking the bioluminescent growths into a frenzy of cyan and magenta strobing across every wall, fog blasting outward from the center on every downbeat and sucking back in on the offbeat like the space is breathing with the four-on-the-floor, metallic percussion scattering white-hot particle bursts across the ceiling, the color palette at full saturation with amber, cyan, magenta, and white all firing simultaneously, the camera locked in a static wide hold refusing to move while the entire environment convulses with the track's peak energy, water on the floor now bouncing with visible cymatics patterns from the bass frequency, the visual equivalent of every frequency band in the mix made simultaneously visible as light, motion, and atmospheric force."

Sync notes: The fog blast outward must land on every downbeat kick — the space itself must appear to breathe in time with the four-on-the-floor pattern.


Shot 7: Rupture

Track section: Bars 81–96, bridge section, pitched-down vocal chant enters, percussion strips back, bass sustains.

Video Generation Prompt:

"The camera shifts to a slow overhead descent looking straight down at the water-covered floor, the percussion stripped away leaving only the pitched-down vocal chant from @Audio1 and a sustained bass drone, the visual world transformed — the frantic strobing replaced by slow, deep breathing pulses of deep violet light rising from beneath the water surface in sync with each vocal phrase, the bioluminescent growths dimming to a faint ghostly glow, the water surface now perfectly still except for concentric ripples that emanate from the center on each syllable of the chant, the color palette collapsed to deep violet, black, and the faintest trace of silver on the water surface, fog now settled flat and motionless, the camera descent slow and inevitable as if falling into the sound itself, the atmosphere shifted from physical intensity to something ancient, ritualistic, subterranean, the visual equivalent of the track's most mysterious and vulnerable moment."

Sync notes: The concentric water ripples must originate on each syllable of the pitched-down vocal chant — the voice made visible as physical disturbance on the water surface.


Shot 8: Comedown Drift

Track section: Final bars and outro, elements subtracting one by one, energy dissipating, reverb tails extending into silence.

Video Generation Prompt:

"The camera returns to the original static wide hold from the opening shot but now the cathedral is transformed — bioluminescent growths fading one by one like dying embers across the walls, each percussive element dropping out of @Audio1 matched by its corresponding light source extinguishing, metallic percussion sparks vanishing first, then the magenta stab-lights, then the cyan wall-glow dimming to near-nothing, the amber floor-light reducing to a single faint pulse matching the final kick hits, fog settling and thinning, water surface going still, the reverb tails of the final sounds visualized as slowly fading halos of residual light that linger after their source has died, the color palette draining back to the near-monochrome indigo and black of the opening, the camera perfectly still, the space returning to the void it emerged from, the final image a single point of deep blue light pulsing once more then fading to absolute darkness, the visual world ending the way it began — a heartbeat in the dark, now silenced."

Sync notes: Each light source must extinguish in sync with its corresponding sonic element dropping out — the visual mix mirrors the audio mix stripping down to silence.

Generated Video