Google NotebookLM probably has to be one of the most underrated Gen AI tools right now. It’s amazing at ingesting loads of information from a variety of sources and then spitting out a compelling audio podcast. I’ve been meaning to experiment with these and make the leap to video…here’s a peek at the workflow (including pros and cons) as well as the final output.
The Concept: Breathing Visual Life into Audio
The idea was straightforward yet ambitious: take an audio podcast and elevate it into a compelling video experience without the typical production hassles. Leveraging AI to bridge the gap between audio and visual content seemed like the perfect challenge.
Step 1: Crafting the Narrative
At the heart of this project was Google NotebookLM, a tool that’s redefining how we ingest, analyze and learn from content. I provided it with my LinkedIn profile, blog posts, and YouTube channel—essentially, a digital footprint of my professional journey. Using the new “Customize” feature I asked NotebookLM to focus on my current role and passions when producing the audio podcast.
In a few minutes, it produced an impressive 8-minute audio podcast featuring a dialogue between male and female hosts discussing my work and insights. The conversation was not only coherent but also engaging, highlighting key aspects of how AI intersects with hospitality and loyalty programs.
Audio Highlight: I chose a 60-second snippet where the hosts delve into innovative applications of AI in enhancing customer loyalty—a segment I thought would be an appropriate source for our experimental video podcast.
Step 2: Personalizing the Voices
While NotebookLM provided a solid foundation, I wanted the hosts’ voices to stand out from the stock AI-generated male and female voice the tool uses. Using Audacity, the free audio editor, I separated the male and female voices into individual tracks.
I then turned to ElevenLabs, to infuse uniqueness into each voice using the voice changer tool. The transformation was remarkable—the hosts now sounded quite different to every other demo of NotebookLM floating around on social media.
Step 3: Generating Visual Hosts
With the audio perfected, the next step was to create visual representations of the hosts. Midjourney came into play here, though you can pretty much use any tool, from Flux to Leonardo, Stable Diffusion, Ideogram, Adobe Firefly, etc.
I crafted prompts to produce images of a male and female podcast host and after a few iterations, Midjourney delivered, and I upscaled the images within the tool to enhance clarity and detail.
Step 4: Synchronizing Audio and Visuals
The challenge now was to bring these static images to life. Hedra, an AI-driven lip-syncing tool (much like Runway and others) allowed me to animate the hosts’ images in sync with the audio tracks. While not perfect, Hedra added the necessary movement to make the hosts appear as if they were genuinely conversing.
Step 5: Stitching It All Together with Video Editing Tools
Finally, I used Final Cut Pro, a video editor (feel free to choose a tool or platform that suits your comfort level) to assemble the animated hosts and align their interactions seamlessly. The end result was a cohesive video podcast that felt both innovative and authentic.
Reflections on the Process – What Worked Well
- Google NotebookLM’s Content Generation: Its ability to synthesize information and produce a natural-sounding podcast script was impressive. The tool’s recent updates allow for more targeted content creation, making it a powerful ally for creators, educators and learners.
- Voice Customization with ElevenLabs: Altering the hosts’ voices added a layer of personalization that set the podcast apart from standard AI-generated content.
- Visual Creation with Midjourney: Generating high-quality host images without the need for a photoshoot streamlined the production process.
Areas for Improvement
- Accuracy of NotebookLM: Despite its strengths, there were moments where the AI misrepresented facts or events, likely due to overgeneralizing from the input data. This highlights the need for human oversight in verifying most AI-generated content.
- NotebookLM Customize Feature: While you can ask the tool to focus audio podcast production on a certain topic or portion of the provided source material, you still can’t get too specific with it e.g. initially I wanted it to explain what I do as if speaking to a 6 year old…this was completely ignored by the tool.
- Quality Limitations of Hedra: The lip-syncing and resolution had room for enhancement. While acceptable for a prototype, for a polished product, investing in higher-end tools or services might be necessary. You can run the output through video upscalers like Topaz for better results, but this can be costly and time consuming.
- Scaling Up Production: For longer or higher-resolution videos, current tools demand significant processing power and time. Optimizing this aspect will be crucial for future projects.
Broader Considerations
- Ethical Use of AI: As we dive deeper into AI-generated content, it’s essential to consider the ethical implications, including consent for likeness usage and the potential for misinformation…I suspect most tools will automatically add in metadata or tags identifying AI generated or manipulated content (not sure this will deter bad actors though, especially with so many open source tools now available).
- The Future of Content Creation: This experiment underscores yet another shift towards more accessible and democratized content production. As tools become more sophisticated, the barrier to entry lowers, inviting a more diverse range of voices into the mix. This also means expanding volumes of content and a dilution in quality…we’ll need to work extra hard in the future to differentiate authentic, high quality content from AI generated slop.
- Audience Engagement: Balancing AI efficiency with authentic engagement is key. While AI can handle the heavy lifting, the human touch and creativity remains invaluable in resonating with (human!) audiences.
Conclusion
This was an interesting little experiment. Tools like Google NotebookLM and others not only streamline the process but also open up new avenues for creativity and learning (I personally think NotebookLM will be an invaluable ally in the hands of students and learners of all types and ages). While there are kinks to iron out, the potential for AI in content creation is immense and exciting.
As technology continues to advance, so too will the opportunities for creators to innovate and connect with their audiences in meaningful ways. If you’re a content creator—or aspire to be one—there’s no better time to explore what these tools have to offer!