Open AI creates PERFECT Voice Clones - Incredibly Emotive!
Vložit
- čas přidán 29. 03. 2024
- Use code MATTVIDPROAI at the link below to get an exclusive 60% off an annual Incogni plan: incogni.com/mattvidproai
Thank you Incogni for sponsoring this video.
▼ Link(s) From Today’s Video:
Open AI Voices: openai.com/blog/navigating-th...
Grok 1.5: x.ai/blog/grok-1.5
Elon's boast about Grok 2: / 1773655245769330757
Universal Claude 3 Jailbreak: / 1773455789056745782
Amazon invests in Anthropic: / 1773030824927015369
► MattVidPro Discord: / discord
► Follow Me on Twitter: / mattvidpro
-------------------------------------------------
▼ Extra Links of Interest:
✩ AI LINKS MASTER LIST: www.futurepedia.io/
✩ General AI Playlist: • General MattVidPro AI ...
✩ AI I use to edit videos: www.descript.com/?lmref=nA4fDg
✩ Instagram: mattvidpro
✩ Tiktok: tiktok.com/@mattvidpro
✩ Second Channel: / @matt_pie
-------------------------------------------------
Thanks for watching Matt Video Productions! I make all sorts of videos here on CZcams! Technology, Tutorials, and Reviews! Enjoy Your stay here, and subscribe!
All Suggestions, Thoughts And Comments Are Greatly Appreciated… Because I Actually Read Them.
-------------------------------------------------
► Business Contact: MattVidProSecond@gmail.com - Věda a technologie
What do you guys think of Open AI's Voice tech? Use code MATTVIDPROAI at the link below to get an exclusive 60% off an annual Incogni plan: incogni.com/mattvidproai Thank you Incogni for sponsoring this video.
I believe Elon Musk being a staunch supporter of the most horrid genocide of your lifetime, should be a warning your audience should get before marketing or reviewing his properties and assets to the public.
Over 32,000 Gazans have been mass murdered in a US-sponsored genocide, 2/3rds of which are women and children. This genocide is still ongoing.
german has a really really strong US accent.
Can confirm about the German... very strong US accent, especially on how it pronounces the Ls
bro so close to 250k!
Looking forward to it.
I can say that it's better at translation than I can muddle through it.
I would like to see a feature where you can adjust the tone and gradient on certain parts of the audio though.
I'm from Kenya. Swahili is our native language while "Sheng" is popular slang. They did a good job. I'm impressed!
Well as a german I was impressed by the spanish one and disappointed by the german one, that sounded like a dutch trying german.
The German one has a good natural rhythm to it, but the voice has a distinct accent which makes it sound non-native
Yeah the German was pretty bad... it sounded like some weird, inconsistent English accent: Some words basically fine, some were only slightly off, and a few, for example "Kulturen" and "alle" had a really strong American accent, and the accent was the same every time she said the same word. (Also, I believe I have seen enough Anime to judge that the Japanese voice likely suffers from the same problem...)
Bin deiner Meinung.
Meddl loide
Ich spreche Deutch sehr gut mein freund
Clearly NOBODY is reading the blog post... when Voice Engine does translation it RETAINS the accent of the original speaker from their native language. Feature, not bug.
yeah, but why would anybody want that "feature". not me. I don't see any use for it.
@user-vj5fb3ig4z then Don't use it lol
I like hearing people with accents. I don't want every person I talk to, to have the exact same accent.
@@user-vj5fb3ig4z Imagine Mr. Miyagi sounding like Arnold from Happy Days. Accents are charming.
@@user-vj5fb3ig4z
I can imagine it could be useful for dubbing. For example MrBeast dubs his videos into Spanish and Portuguese and posts them on different channels, with something like this he could have those dubbed video be in HIS voice
Hey Matt. As far as the audio of the voice engine sounding low quality if you listen to the audio they're feeding it that's why. That audio sounds like some teacher recording in a room on a crappy laptop mic. That's actually the impressive part not only is it very emotionally and phonetically accurate to how the guy in the source recording sounded but it's also mimicking the sort of edited sound of the audio and the conditions of the recording. As an audio engineer I find this insane.
I'm surprised no one's talking about how cool this is for patients with speech impairments.
Too many safety concerns with OpenAI. That's the only reason I am not too excited.
In some cases, where the generated audio sounded low-quality, the original didn't sound like a studio recording, either. I guess, it was doing as good as it could with what it had to work with. Amazed by the fact that you can "give someone back their voice" using such a small amount of audio content, and the way people are always recording themselves these days, we probably all have at least 18 seconds of audio... if not, put some aside as an archive for the future, just in case.
The German and French versions were not good. I was a bit surprised by the Swahili version, which was a bit better. Open AI still has a lot of work to do on non-English languages.
🎵 Everyone together, sing it with me! 🎵
🎵This is the worst it’s ever gonna be! 🎵
The thing is, Eleven Labs, is a product, not research, if that makes sense.
Those samples from OpenAI, are the raw outputs from the model. Where as something like Eleven Labs, you can be sure they have a insane pipeline to take the raw outputs from their models, and clean them up. You could even create custom neural networks for this task etc.
Also, you can try Voice Engine. You can use it via OpenAIs APIs, but you dont get to provide a reference, you can only pick from a selection of provided voices. Its what powers chatGPT voice.
The most important aspect is the input; if you have an emotional voice in the input, the output would sound amazing. I'd like an AI that enhances voice input to make it more emotional. The input at 7:15 sounds monotone.
I mean, if they're going to wait until all of their "conditions" are met before they release this voice engine, then they're never actually going to release it.
The German one, while I do like the intonation and all, it definitely has a strong accent. Without that accent, this could've been the best AI generated voice translation, I've heard.
that swahili one is amazing! one can tell it is ai generated but it is still so good.
French here. I confirm that the french voice does have a weird accent, but that's honestly still very good.
The french had a hint of an american english person trying to speak canadian french. I am fluent in both.
Same for german
Same for portuguese. It sounds right, but has stops at the wrong places.
German has the same weird American accent
@@testboga5991 I'm leaning towards the native accents being intentional. In a way it sounds more authentic.
@MattVidPro my boy done sauced up in that sponsor message, chain looks good bro 💪
German: light accent, like an American who's lived in Germany and spoken German fluently for 3+ years. This is also what chatGPT sounds like when it speaks German.
French: light accent, maybe a tiny bit thicker than German.
I don't speak Spanish but that accent sounded very very heavy to me.
Is this because they were trained with American voice actors speaking other languages, or does this happen naturally when an english trained model speaks another language? That would be fascinating.
Spanish reference sounded stilted and unnatural, while the generated audio sounded VERY natural. Weird.
Spanish from English reference had a slight English accent, which is very interesting and I hope it keeps doing that.
I confirm this. The first Spanish already sounds generated and the AI sounds more natural in comparison. In the second sample, the voice has an English accent.
As a regular ChatGPT voice chat user, I can definitely tell that the quality of the audio generated from the reference audio is very reminiscent of GPT voice chats. It doesn't necessarily have the best quality, but I know it can be better, as proven by companies like ElevenLabs. And another thing. I think ElevenLabs translation feature can be a little bit iffy when it comes to how natural a person's voice sounds once it's translated to another language. However, for Voice Engine in particular, I was very shocked to hear how natural a voice still sounded after being Used to translate something else into another language. I also found the Americanized pronunciation of some words in other languages (German, Chinese, Spanish, and others) to be particularly funny, but I think AI can definitely progress past that point.
Everyone is missing this, but the accent is on purpose. The blog post says that the languages will retain the accent of the original speaker.
@@justinwescott8125 I'm really happy about that. Retain accents so people can express there backgrounds.
Heard many multilangual people speak and their accents and tone of voice tend to differ between languages.. if the voice has the same accent and tone, it most likely is AI generated and not a recording
@@kuromiLayfe Only in Japanese. Sounding like an idiot is mandatory in that language. Joking aside, you are right. However I strive to maintain the same voice and prosody across all languages without hurting pronunciation. I just steer clear from Japanese.
@@kuromiLayfe Only in Japanese. Sounding like an idiot is mandatory in that language. Joking aside, you are right. However I strive to maintain the same voice and prosody across all languages without hurting pronunciation. I just steer clear of Japanese.
A small dedicated segment covering the overall flow of the model architecture would be great.
If you have the domain knowledge, it would be even greater to discuss the "why"s regarding the working of the model.
The demos were amazing!
Open Ai writing it name in the history of the beginning of artificial intelligence
OpenAI has been in the lead for 2 years
her 😂
@@treudden Not anymore... ElevenLabs shits on OpenAI Voice and Claude 3 shits on GPT 4. While Gemini has 1 million token count and is also very good. ClosedAI better hurry up otherwise they will be left behind.
How dare you assume its gender..!!
@@helix8847 Geminis it terrible.
I've been using the voice generator in the Simplified app. and it sounds like me but it does have a bit of difficulty with emoting, but it's not a huge problem, so it works for me.
I speak English and portuguese, and man, English with Portuguese accent is amazingly good!
I thought the one I understand might seem worst, but the Portuguese was great!
I'm Brazilian and the Portuguese part wasn't as good as ElevenLabs.
The German sounds like someone with a very heavy American accent, but otherwise it was correct.
Man, you should really get the DarkReader extension. This video was very very bright lol.
French one has an accent but it's really good, like a non-native with a high level French.
That's great that we have a competition here. We'll see soon what Meta and Apple show.
*whispers* Did my phone start imitating people?
Yo matt! Im curious what your opinion is on the best local TTS software? :D
Wild, is Alexa's new gig gonna be a voiceover actor?
🇫🇷🇪🇸 For French and Spanish, there's a strong American accent while speaking these languages, hope trained data gets broader to improve audio generation !
I love how Anthropic pulls ahead, great competition all around, we all win. I've had GPT4 for some time now, I've loved it's abilities, but Dalle being added to the package is the clincher. If Opus added an image gen to their product, I would definitely move over to them. That is...until SORA comes out...see? What do we do?
Worth adding also: Not that I'm super into benchmarks (I feel a bit guilty nitpicking on this): When mentioning the domination of Claude 3 Opus even in comparison to GPT-4, this is in comparison to GPT-4's original paper back in early 2023. From what I understand, GPT-4 Turbo is much better, e.g. on HumanEval & others (can search up "EvalPlus Benchmark", which also has the original HumanEval benchmark).
Do we all win by Stability AI collapsing under the weight of competition? I'm not sure about that.
@@brexitgreens bit bummed, my buddy works over there, and I'm rooting for 'em!
@@Glowbox3D Only bad guys are _not_ rooting for them.
Speaking of Anthropic getting their own image generator - they are allied with Amazon and Amazon already has their own named Titan. Not many people know. In terms of quality, it's between DALL·E 2 and 3. Comparable to SD XL.
As someone fluent in Japanese, the Japanese audio you showed sounded very foreigner sounding and not Japanese. Not good quality 😭
That's on purpose. The blog post specifically says that the accent of the original speaker is maintained period it's supposed to sound like an American speaking Japanese.
Japanese is 100% pure 外人 or put another way 日本語上手. Was waiting for it to say さようなら at the end.
"The Japanese audio didn't sound Japanese as someone fluent in Japanese"? Maybe try to learn English first.
I can confirm Portuguese sounds natural.
Looks like we've all been successfully SHOCKED! 😅
Amazing!
Brazilian here, brazilian portuguese is sounding very good!
It seemed robotic to me, I mean, without emotion, IDK, it was a bit weird the way he was finishing the sentences
@@Kiiush Brazilian here, normally people talk more robotic in a studio setting even the reference audio is not that normal sounding, in a studio you normally try to be very formal and say every syllable in this monotone way, which is not how people talk irl
the voices have a good cadence but low overall clarity quality... still very impressive
p.s. once we get good translation and audio for all the world languages, it's going to have a huge impact. e.g. I work with immigrants from east africa. many barely speak english and may have never used a computer in their life. it is very difficult for them to learn to use. having a computer they can just talk to in their native language can mean the difference between computer usage or none at all.
Right now we have good auto-translate for around 100 languages (which do represent the majority of the planet), but researchers are now working on the next 1,000. (Then there are still a lot of tiny, local languages.)
To me, the voice generator sound like a voice generator for it's emotive capabilities. Far from human.
That is the most realistic TTS I've heard so far! How much do you want to bet *this* is the model being used in Figure 01?
The Chinese is a hair better than the French, Japanese and German accent wise. The original English model occasionally overcomes the actual Chinese weights. A- for the Chinese. B+ for the others to me. Portuguese is an A match to the initial voice. Forget emotive I'm happy about diversity. Pi P8 is the standard for diversity in English.
The german wasn‘t realy good
im a spanish and portuguese native and the voice pronunciation sounds horrible, not natural at all
I'm kinda wondering how this would handle speech when it comes to text like a list of ingredients off a cereal box. Would sound odd being emotive
I could tell too they sound very robotic
True. German was bad...
Spanish neither
The Portuguese had inflections in the wrong places, it's pretty good, tho.
The mandarin version sounds like an English speaker who got into university studying Chinese for couple of years.
Did the ai nailed the tones at least ?
@@bastienpetit5161 It did nail the tones, that's a low standard for ai though.
I guess that was the point.
In Spanish, it clearly has an English accent, but it sounds very natural... love it!
as a native speaker i can tell spanish version is so fcking good
Finally someone using the "as" construction correctly: with the subject ("I") agreeing in both clauses. Very rare in 2024.
When they say "preset voices", I'm pretty sure they're referring to the built-in TTS voices that all OSs come with by default. You know, Microsoft Sam and friends; the light-weight handcrafted roboty voice that screen readers default to.
Killer App: voice cloning for texting.
The audio quality depends on the source quality being fed into it.
In Chinese style, this situation is called "Million Model Warfare".
The german one has a strong accent. But it's understandable.
the french voice sounds like its read by an english locutor
The Spanish translation has a very strong accent.
I am waiting for a time that I could listen to the text part of my ebooks with ease. I use Apple’s screen reader but it’s painful.
7:24 my native language is spanish, and I understand what she's saying but at times it sounds like an american who is learning spanish and hasn't fully mastered the "r" sounds. When she says "aporta" and "importar", all 3 letter "r" sound like an english "r" rather than spanish.
Parity-wise, Elevenlabs is better at most of the multilingual voice cloning, although I was especially impressed by the quality of intonation and pauses in the first English example.
On a side note, voice recovery is not new - it's just voice cloning from old footage but it unfortunately retains the bad audio qualities from the same footage. It would have been more impressive to have just cloned the woman from her post brain-damaged voice in this particular case. Or even better blended them both together but maybe using EQ matching.
did I win? I didn't sign up for the contest so I must be the person who won. clearly. :P
I wish Stephen hawking was alive to use this voice box
What are you opinion on Emads leaving Stablility AI. Do you think with time Open Source will have less and less competitive.
Now I know, Musk's Grok-1 is fairly early.
Creepy, or the future of virtual assistants?
I still miss the option to give a prompt besides the information i want it to voice. Something like "sound angry, sound drunk, make long pauses, etc"
Btw: The German generated text was horrible, it sounded like an american trying to speak german.
The Mandarin one was good but it sounds kinda American...
Who do you think is still ahead of the competition matt?
I can only comment on the English audio. It was surprising that the text didn't have punctuation other than periods, and it still knew where to short or long pause.
The second Spanish chick (AI) speaks better than the human lol sounds like a Mexican children's author. The translation from English to Spanish, has brutal pronunciation.
You didn't even talk about one of the most impressive features, the translated language has an accent!
This will be the "SORA" of A.I voice generation😂
Just wanted to compliment you on your audio quality using the RE20. Really good clarity and not boomy.
As a french and english speaker the french that they spoke was with the woman's american accent... therefore making it impressive.
It intentionally tries to mimic the original speaker's intonations, making it sound more like them but pretty much transmitting their native tongue accent to different languages.
Agree, it sounds like a American woman who speaks very good French, but with an accent. Not an overly strong accent, but definitely identifiable a North American English speaker.
Where can we try Chat Gpt voice? I’m French, and i want to try if the english accent dissapear…
"On mars by next week" Elon in a nutshell
French one is an american talking french.
@7:25 It's spanish but sounds very much like a white person speaking spanish (it has an accent). The same way a foreigner sounds when they try speaking English. I have been teaching ESL 3 years, believe me I can hear the accent.
Audible is in big trouble.
its good but specially in the translation theres some englishness that filters to the translated languages, you can notice it in all languages actually
8:30 The German has a really tick accent (it basically sounds like an american one). Doesn't sound fantastic tbh, at least not if the goal is like very good dubbing / translation. But it sounds good as in AI voices go
I speak spanish and chinese and they sound like an American accent speaking those languages … but in a way thats very understandable, as if they learned the language very well but did not manage to get rid of the accent …. Did not robotic to me if thats the concern… the accent might bother some natives
I agree with a lot of the other commenters here. The voices seem to have an American accent. I'm only fluent in Japanese but I picked up an accent in most of them. You can understand the Japanese just fine, but the accent is a little cringe.
O Português ficou perfeito, idêntico a uma pessoa real falando numa gravação real, ficou realmente perfeito.
As someone who knows Spanish, that's pretty bad lol. It sounds like a strong American accent trying to speak Spanish.
Bro have you not been talking to OpenAI in the app. Their voices have been around for like 6months.
Ham sandwich here I can confirm the Swahili is a weird accent
the english to spanish one is tricky, it sounds like spanish but with english accent, so not full spanish
The Spanish cloned voice sounds amazing 😍, the best one I've ever heard, edit: la traducción es bastante mala
🇧🇷🇧🇷🇧🇷🇧🇷👏🏻, Wow, the song's are fricking awesome on v3, voice still needs some work tho!
The French one is weird. It sounds robotic and as a French I don't see what accent it is, doesn't sound like France French, doesn't sound like Canadian French, just weird
Japanese seems to have a very heavy accent too that I can't categorize
Spanish sounds very bad imo
The translation keeps a weird accent that makes it sound very weird. I don't think it's the woman's American accent that is kept but a weird mix that is quite uncanny
I want to see more foreign language stuff translated into English so I can evaluate it.
French is still a bit robotic, and the voice seems to speak with a slight english accent.
I am learning Japanese but to me at least with someone that has listened to it a lot seems really strange accent wise
Your user image leaves no doubt about it.
GCP has something similar but not that emotional
I speak French. Sounds like an Anglephone (English speaker) who has learned French (which the boiler plate on that video says is the intent)
... Though it still rolls the rs better than I do.
I think the translated audio keeps the original accent and sounds a bit weird and not perfect
Weirdly, the Spanish voice had an English accent
German has a strong american accent. Just like the voice out from ChatGPT
I just wish I could set the playback speed 😅 I use text to speak because I read slow, I shouldn't be outpacing the ai.
What do you think about Singularity Intelligence for New Earth (QORA*) for EQORIA, United Earth? It is the first introduction of the vision and more to come... check it on youtube channel. EQORIA will begin promotion to mass media beginning December 12, 2024 on 12 year anniversary.
It is so strange to hear ai generated voices with an accent. like im german and the german one sounded like an american tried to read out german text
German has English accent. Slight. French too.
In the german example the "r" was pronounced like it was an english word. Germans pronounce the r much harder.