We're looking at creating about 400 3-5 minute videos this year. Our current process is to record ourselves talking over a PowerPoint for the lesson videos.
Instead, I'm thinking of using WellSaidLabs text-to-voice or Descript Overdub with a trained voice that can speak the scripts to an audio file. Then drop the correlating audio file onto each slide and export it to video. Done.
Modifying the lesson in the future will be as easy as editing the PowerPoint or dropping in a new audio file.
There's some resistance to using an AI voice. Do any of you have experience with AI text-to-speech that sounds realistic? What about using Descript Overdub? Videos I've seen where people have trained it with their voice sound impressive.
I want to focus on writing good content and automating the production part as much as possible. Do you have any tips or ideas to help make this happen?
I used WellSaid once for a project that required multiple different voices both male and female only because I could not mimic them all myself. For other projects - I record them myself.
At the time - I thought WellSaid was the most realistic sounding - but still distinguishable from a real human voice and very much lacking in the emotion department.
Marketing would have you believe it is as quick as dropping the file and BAM! but that is often not the reality. I spent a great deal of time editing my text so the “voice” could read it properly. Way more than I would have if I just recorded it myself.
If I needed a voice - I am dropping the text for multiple characters that read it differently to find the one I like - then, like I say, you have to tweak the text with funny spellings (like funny becoming fun knee or fuh nee or “fuh” nee) and replay it to get them to pronounce certain words the way they should. It was more time consuming than you’d think. Not uncommon for me to have a dozen takes on a sentence - it got old.
Like many - I watch a lot of YouTube videos to learn new things. If I press play and a robot is talking to me - I stop it and continue searching. Personally - I cannot stand eLearning with robot speech. Perhaps I am in the minority on this but there you have it. I think your resistance to using AI for this is reasonable and not to be dismissed lightly.
I can’t think of any processes that would be faster than what you are already doing other than one idea. Find someone in the company that has a great voice and ask them to record the voiceovers for you and then get a whole bunch of audio in a short time period that you can edit into the videos as you produce them.
As a general rule though: shortcuts rarely are...