I don't follow. Why is the data they are getting from this better than the billions of hours of captioned voice data available from youtube/tiktok/instagram/whatever?
I don't mean to suggest that this particular toy is much use. I just mean you should not give random internet games standardised samples of your voice, for this reason.
It's a standardised sample, already correlated to text, close to the microphone, for one thing. You're just making it easier for them.
I mean I suppose you can use "like and subscribe", "without further ado", and "let's get started" as standardised samples if you want to catch a youtuber.
But AFAIK my voice isn't on the internet anywhere. Quite a lot of people are not.
There's a number of ways this information can be connected back, with varying precision, to the person who recorded it.
And we should have learned from the Cambridge Analytica scandal that data is used in ways we do not expect. For example, what if you don't care to reproduce someone's voice, but you do care to extract age/gender/racial background/sexual orientation from it?
What I mean is that there are still billions of people whose voices are not on the internet.
I'm more than half a century old, an internet geek since a few years before the conventional "dawn of the web" and AFAIK there is no recording of my voice on the internet.
Added to which, controlled samples like this with a good range of syllables will always be more helpful, won't they?
Or teach people not to trust that you said something just because something sounds like you. Use actual authentication instead of implied. Same for photos, videos
This is perfect for CEO scams in most American companies.
Many (large and small) American companies (and other nationalities as well, sure) a top down management approach is the norm. I.e. "CEO" (or "your manager" / "person in power") says something and you jump and do it without asking any questions because you fear you'll be fired otherwise (or have other repercussions).
In such an environment, imagine the CEO / person in power giving the best sample ever to the crooks, such that they can clone your voice almost perfectly. Now, of course, CEOs are likely to be recorded in various events anyway but some others are less likely, say the CFO.
Then order some lowly finance drone to wire a billion bucks to your account (well, maybe a bit less, and make sure to use someone else's account, seven levels of money mules and 17 different crypto currencies with mixers etc. before cashing out) with your faked voice.
We caught a CEO scam that was pretty good but noticeable recently. They had cloned his voice.
Isn’t it desirable to weed out organizations with such fragile procedures…?
It’s like how those ransomware thieves incentivize all the critical computer systems in the world to remain air gapped, which seems like an overall net positive.
In a sense I agree with you. However, really great organizations have weak links. It only needs one unfortunately. I personally don't want to be out of job because of one weak link.
Sort of to your point, we do have training (which I find obnoxiously dumb, but many seem to find it great - I just let the video run in the background and answer the questions without actually watching a single second of it) around this sort of thing and we have phishing tests that are super easy to figure out (the email headers literally tell you it's a phishing test) but various people post on internal channels "Is this a scam? I'm not sure, please help!" and not all of them are non-technical people at all.
Above a certain size of company there just are gonna be some weak links in just the wrong place(s) randomly even with the best procedures unfortunately.
Now all they need to do is somehow work out who you are from only your IP - no email, name, location or anything - then simply get a voice cloning model to work perfectly from this small sample, then either somehow hack all the other information needed to get into your bank account or chase down your family to get them to send them crypto and they've got you dead to rights. Simple as that, which is why I also never take phone calls, pay for anything with a credit card or go outside.
I have often wondered how much of someone's actual voice leaks through into their impersonations in a way that can be detected.
I just saw an incredible Facebook reel of a voice actor, Shelby Young, saying the same thing in a striking range of theatrical voices, and I still wonder. How much of her true vocal fingerprint is unavoidably there?
(As a fan of old movies, "Vintage" was particularly impressive to me -- she is impersonating not just voices but also the choice of tonality those actors made in light of recording technology)
I'm usually able to identify most of the big VAs regardless of the disguise they're putting into their voice. Dan Castellenta, Jon Lovitz, Maurice LaMarche, Rob Paulsen, Jim Cummings all come to mind as ones I've correctly identified in various productions. The list is larger I'm sure.
An AI might be able to gain a higher success rate than me.
You enter your First Name, Last Name, Gender, Date of Birth, Pet's Name and Mother's Maiden Name and press the button to find out what your Mr T Name is...
Don't do this.