Was going to make a separate thread - but this seems to fit here. There is also a whole parallel track of deepfaking voices, which has made
insane leaps in the last year or two - much like visual deepfakes.
Most of the people/companies who have cracked it are keeping it to themselves or offering paid services (Resemble, Lyrebird etc). The main guy behind Resemble though has published a few papers and still has hosted a now-abandoned Github repo that can recreate voices from 3-4 five seconds clips of them talking and let you text to speech with it. It's nuts, but has some limitations and i'm currently experimenting with it for a GOT Season 8 Fanedit i'm working on.
Currently - there is not a directable way you can change intonation or inflection, instead the result is very 'matter of fact' as the network has been trained on a database of thousands of audiobook narrations. The text parser cannot process punctuation, only line breaks for pauses, so you have to work around that. I have also found (much to my chagrin for the british accents in GOT and things like Star Wars and LOTR for example) that it only seems to be able to replicate mainly american accents.
It will work with whatever you give it, but it will often 'americanize' the output, probably because so many audiobooks have american narrators. With that in mind though, it can be scarily convincing if you have clean audio to feed it, which we often do as nerdy film fans. Not to mention many films and characters speak with an american accent already.
If you are in this thread, then the benefit should be as obvious to you as to me. New lines of dialogue that didn't exist before, recitations of lines from scripts that didn't make it, absolute hilarious garbage. My main aim with this if I can ever get round the accent issue is to have one or two original lines and some lines from the ASOIAF books ADR-ed into my fanedit of S8 of GOT, cinematically over landscape shots/flashbacks etc. You obviously have to be a bit creative, not having the visual of the actor speaking.
You have to do a very specific install and have an NVIDIA graphics card with at least 2GB ram as it runs exclusively on your GPU. It is quicker to process less words, but can process a lot if you have time. For small ones, one or two lines - it only takes about as long as it would take the lines to be read, maybe twice as long - which is close to real time. You don't have to have your computer screaming computationally for 30 hours as you have to with some visual deepfake things.
Because the install order and install versions of everything are
vital to get this working, I highly recommend this guide that I followed as was able to get working as someone who is massively interested in AI, but has no practical knowledge or skills.
Foolproof Guide
God speed fan editors and be responsible with your power to have actors say all kinds of ridiculous and/or cool things!