Making SENSES: a real workflow for hybrid documentary with AI

The complete workflow behind SENSES, a four-minute hybrid documentary about infant perception. Every tool, every phase, every decision in a real AI documentary practice.

Giulia Maniezzo

6/5/20265 min read

Making SENSES: a real workflow for hybrid documentary with AI

A four-minute hybrid documentary, made one phase at a time. Every tool, every decision.

For three years I've been making documentaries that use AI. SENSES is the latest one: a four-minute hybrid short about how a baby experiences the world in her first year of life. The subject is Marina, my daughter, seven months old when we filmed.

This post is the workflow. Every phase, every tool, every decision that mattered. If you're a filmmaker thinking about how AI fits into documentary practice, here is what mine actually looks like.

What SENSES is

The film has five chapters: Sight, Hearing, Touch, Smell, Taste. Each combines real footage of Marina at home with AI-generated visuals. The voiceover is built entirely from peer-reviewed developmental neuroscience. It sounds like poetry. Only in the end credits does the audience discover that every sentence was a real scientific citation.

The phases overlap now

The first thing that changed when I started working with AI: pre-production, shooting, and editing stopped being separate steps in a line. Now they overlap and feed each other. The film gets made in the timeline.

I started my career as an editor. That habit of working in the cut, of letting the film tell you what it needs while you're inside it, has become the way I make everything. Below is the actual order of things on SENSES.

Research

I used Claude to study how babies perceive the world. Then I fed those findings into NotebookLM to go deeper and pull more peer-reviewed sources. I created a dedicated Claude project, filled it with all my research, and worked through the citations for each of the five senses, picking the most striking real fact for each chapter and verifying it against the original paper.

The result: a voiceover where every line is a real citation. "In the first year of life, the brain creates over a million new connections every second." "When a baby smells her mother, her brain stops reacting to fear, even before her mother enters the room."

Story and shooting

From the research I wrote the story, deciding what we could actually film and how it would connect to the science. Florian shot the real footage first. The real part is always the most important.

Editing starts from music

I edited the real footage starting from the music. Emotion guides everything. Before any visual choice, before any AI generation, I want to know what the film feels like. The music is the spine.

Then I generated the voiceover with ElevenLabs Voice Design v3, calibrated for intimacy. It had to sound like a mother thinking, not a narrator explaining.

Stills first. Then video.

I never start with video. Video generation is slow, expensive, and most of what you generate doesn't work. Start with stills.

For SENSES I explored each chapter as still images first, using Nano Banana Pro, Flux Pro, and Midjourney. Each chapter needed a different visual world: long-exposure analogue blur for Sight, vertical sacred architecture for Hearing, extreme macro texture for Touch, an impossible shadow for Smell, the body as a projection screen for Taste.

From the stills I went back to the research, revised some citations, found others that fit better. Reality changes when you start visualizing, and in a documentary everything changes with it.

Then I moved into video generation with Seedance 2.0 and Kling Omni, accessed via Runway and Freepik.

Character consistency

To place Marina and myself inside environments that don't exist, I built character reference sheets from our real photos. Multi-angle portraits, neutral lighting. These got uploaded as character references in Kling Omni and Seedance 2.0.

This is the difference between AI video that looks like a stranger pretending to be your child, and AI video that actually looks like your child.

For key locations, I generated the environment empty first. The Tower of Babel sequence for the Hearing chapter was built this way: generate the architecture without people, then use that as an environment reference for the shots with figures inside.

Prompting

I built a custom prompting skill in Claude to develop and refine all generation prompts throughout the project. Each chapter had its own visual rules and references. Having a structured prompt assistant inside Claude meant I wasn't writing the same things over and over, and the prompts got better as we iterated.

One note: Claude is much better than ChatGPT for this kind of work. It understands references, holds context across long projects, and is willing to be specific instead of reaching for the generic.

Finishing

After cutting the AI sequences into the film, I reviewed the music, built the sound design, designed the graphics, graded the color, and upscaled with Magnific and Topaz. The film took its final shape in this pass: small adjustments to pacing, small shifts in color, small refinements to the soundscape.

The taste rule

The biggest difficulty in working with AI generation, especially for video, is that the default aesthetic is magical and golden. Glossy. Dramatic. Slightly kitsch. If you accept what the tool gives you, your film looks like an ad.

A lot of the work on SENSES was pulling back from that default. Mixing references. Working with point of view. Asking for specific lighting types instead of "cinematic." Hunting for something that felt particular to this film, with its own taste.

This is where the human craft lives now: in editing the AI's instincts, in knowing what to keep and what to throw away.

The pre-title scene

The hardest decision on this film was the pre-title sequence. I tried many things. I wanted to open with black-and-white images that felt like cells or brain matter. I wanted Marina's point of view, blurry first, then in color. I wanted a tunnel into her eye.

In the end I landed on a single citation: "In the first year of life, the brain creates over a million new connections every second." That sentence, paired with abstract images of neural connections, opens the film.

The decision came from Claude. I had drafted that line in an earlier concept and abandoned it. Claude reminded me it was there. I tried it at the start instead of where I had originally planned it, and it worked.

What this kind of workflow needs

If you want to work this way, the things that matter most:

Patience with iteration. Most generations don't work. The good ones come from understanding why the bad ones failed.

Strong editorial taste. The tool will give you a lot. You have to know what to throw away.

A way to organize your prompts and references. I use Claude projects. Some people use Notion. The system matters less than having one.

Comfort with non-linear process. If you need pre-production, production, and post to be separate phases, this way of working will frustrate you. The film gets made in the timeline.

Why I work this way

I spent over a decade inside European television production, watching budgets force creative compromises. When AI generation matured, I saw a production tool as significant as the invention of digital editing.

SENSES came out of that. For this particular film, I needed AI for one specific reason: the images of how a baby perceives the world cannot be filmed with any camera. They can only be imagined, and then visualized. The voiceover carries the science. The AI carries my best guess at what that science might feel like from inside an infant's brain.

SENSES is currently in festival submission. Watch the film on the SENSES page.