Skip to main content

We're looking at creating about 400 3-5 minute videos this year.  Our current process is to record ourselves talking over a PowerPoint for the lesson videos.  

Instead, I'm thinking of using WellSaidLabs text-to-voice or Descript Overdub with a trained voice that can speak the scripts to an audio file.  Then drop the correlating audio file onto each slide and export it to video.  Done. 

Modifying the lesson in the future will be as easy as editing the PowerPoint or dropping in a new audio file. 

There's some resistance to using an AI voice.  Do any of you have experience with AI text-to-speech that sounds realistic?  What about using Descript Overdub?  Videos I've seen where people have trained it with their voice sound impressive.  

I want to focus on writing good content and automating the production part as much as possible.  Do you have any tips or ideas to help make this happen? 


Thanks! 

 

 

 

 

 

 

My 2¢…

I used WellSaid once for a project that required multiple different voices both male and female only because I could not mimic them all myself. For other projects - I record them myself.

At the time - I thought WellSaid was the most realistic sounding - but still distinguishable from a real human voice and very much lacking in the emotion department.

Marketing would have you believe it is as quick as dropping the file and BAM! but that is often not the reality. I spent a great deal of time editing my text so the “voice” could read it properly. Way more than I would have if I just recorded it myself.

If I needed a voice - I am dropping the text for multiple characters that read it differently to find the one I like - then, like I say,  you have to tweak the text with funny spellings (like funny becoming fun knee or fuh nee or “fuh” nee) and replay it to get them to pronounce certain words the way they should. It was more time consuming than you’d think. Not uncommon for me to have a dozen takes on a sentence - it got old.

Like many - I watch a lot of YouTube videos to learn new things. If I press play and a robot is talking to me - I stop it and continue searching. Personally - I cannot stand eLearning with robot speech. Perhaps I am in the minority on this but there you have it. I think your resistance to using AI for this is reasonable and not to be dismissed lightly.


I can’t think of any processes that would be faster than what you are already doing other than one idea. Find someone in the company that has a great voice and ask them to record the voiceovers for you and then get a whole bunch of audio in a short time period that you can edit into the videos as you produce them. 

As a general rule though: shortcuts rarely are...


New workflow: 

  1. Upload course context to ChatGPT
  2. Generate course outline with ChatGPT
  3. Generate lesson outlines with ChatGPT 
  4. Create PPT slides
  5. Generate lesson scripts with ChatGPT (1 script per slide)
  6. Generate per slide audio using cloned voice in 11Labs
  7. Drop audio onto each slide 
  8. Animate slides 
  9. Export PPT to video

Done! 

This seems complicated, but this is a repeatable process, and we’re 3x more efficient. We utilize several Python scripts that I developed to speed things up.  Also, 11Labs’ cloned voice is amazing!  It’s our trainer's voice but much clearer and more concise. Plus, the voiceover stage when from an hour per video to 5 minutes or less. 

Another added benefit is that we have editable PPTs vs everything being done in Camtasia so it’s much easier to make updates. Edit a pic or audio and export… easy! 


Do you have a company policy on the use of AI? Just thinking about the information you sharing being used as training for the LLM.


We do but the training context information we are using is already publicly available.

Also, as of March, OpenAI added the option to opt out of training the LLM with your data. 

https://help.openai.com/en/articles/7039943-data-usage-for-consumer-services-faq


Nice, thanks for the link @ChisF🙂


Reply