RVC's Realtime AI Voice Changer - Is It Any Good?
Vložit
- čas přidán 25. 07. 2024
- Today, you will learn how to use RVC's free AI Voice Changer - FREE & Realtime! Transform your voice into your favorite CZcamsr, VTuber, Anime Character, and more! We'll also talk about if this is better than W-Okada's voice changer or if you should just stick with that one.
Go to this link to install the Voice Changer:
github.com/RVC-Project/Retrie...
How to get it to work with Discord and other apps:
• How to Change Your Voi...
How to find your own models:
• Where To Find AI Voice...
How to train your own model:
• Video
W-Okada's Voice Changer:
• How to Sound Like an A...
Search the most complete list of AI Tools, also available in 中文, español, 日本語:
ai-search.io/
DISCLAIMER:
Please do not use these models for malicious, harmful, or deceitful things. Please use them to have fun and experience this new technological age.
~~~~~~~~~~~~Timecodes~~~~~~~~~~~~
Intro - 0:00
Installation Tutorial - 0:23
Using the Software - 3:42
Is it better than W-Okada? - 9:44
Wrapping up - 10:59
~~~~~~~~~~~~Timecodes~~~~~~~~~~~~
Here's our equipment, in case you're wondering:
GPU: RTX 4080 amzn.to/3OCOJ8e
Secondary GPU: GTX 1080 (too old, would not recommend)
Mic: Shure SM7B amzn.to/3DErjt1
Secondary mic: Maono PD400x amzn.to/3Klhwvu
CPU: i9 11900K amzn.to/3KmYs0b
If you found this helpful, consider supporting me here. Hopefully I can turn this from a side-hustle into a full-time thing!
ko-fi.com/aisearch - Věda a technologie
I've been experimenting with this for a bit, and I'm disappointed by how vague and incomplete the English documentation on these settings is. In an effort to remedy this, here's my breakdown of each setting:
Response threshold: Controls the noise gate. Any sound below the threshold is suppressed. This is used to prevent background noise and hiss from being turned into strange mumbling. Equivalent to "S. Threshold" in w-okada. Not applicable in RVC WebUI.
Pitch settings: Applies a pitch offset to your input voice. Every multiple of 12 setting increases or decreases the voice by an octave. Adjustments by 1 increase or decrease by a semitone. Using whole octaves is primarily used to ensure you can sing in the same key. Equivalent to "TUNE" in w-okada. Equivalent to "Transpose" in RVC WebUI.
Index rate: When an index file is provided, this slider augments the target voice by preserving more of its accent and less of the input voice (to reduce tone leakage). This is particularly useful for voices trained with a low epoch count (around 200-ish or less). If set too high, it can cause strange pronunciation artifacts. I usually find something around 0.30 to sound good, but it varies by voice model. Equivalent to "INDEX" in w-okada. Equivalent to "Search feature ratio" in RVC WebUI.
Loudness factor: How little to preserve the loudness of the input performance. At 0, the loudness of the cloned voice should match the loudness of the input voice. At 1, the cloned voice will always be at full loudness. 0 is useful if you want to distinguish between whispers, talking, screaming, etc. 1 is useful to have the cloned voice always speak loudly and clearly, as loud as the loudest things it was trained on (which can have artifacts such as mic clipping depending on the training set). Values in-between provide partial volume control biased toward being louder, the closer you get to 1. There is no equivalent in w-okada. Equivalent to "volume envelope scaling" in RVC WebUI.
Pitch detection algorithm: Different algorithms are better at different things. rmvpe is the current state-of-the-art and works fastest and usually with the highest quality. Equivalent to "F0 Det." in w-okada. Equivalent to "pitch extraction algorithm" in RVC WebUI.
Sample length: The realtime voice changer works by sending small chunks of audio for quick conversion, then stitching them together. Longer sample lengths feed in longer chunks, making the stitches less obvious and reducing GPU requirements but increasing output latency. On a low end GPU, setting this too low will make the GPU unable to keep up and produces stutters. On a high end GPU, setting this too low will cause warbling as an artifact of stitching many overly-short chunks together. Equivalent to "CHUNK" in w-okada. Not applicable in RVC WebUI.
Number of CPUs: Self explanatory. Note, however, that rmvpe is a GPU-based pitch extractor and should be relatively unaffected by this setting. There is no equivalent in w-okada. Not applicable in RVC WebUI.
Fade length: The length between chunks to crossfade together. Longer may reduce warbling. Equivalent to "overlap" in w-okada advanced settings. Not applicable in RVC WebUI.
Extra inference time: How much old audio to load into each chunk. The extra context usually improves voice quality for the generated chunk but is more demanding for the GPU. Equivalent to "EXTRA" in w-okada. Not applicable in RVC WebUI.
Input noise reduction: Attempts to remove non-speech background noise from the input to prevent sounds from being turned into strange mumbling. Equivalent to "NOISE" in w-okada. Not applicable in RVC WebUI.
Output noise reduction: Applies the same noise reduction to the output voice. Possibly good for poorly trained voices with lots of background noise. There is no equivalent in w-okada, but the usefulness of this setting is dubious. Not applicable in RVC WebUI.
Input voice monitor: Lets you hear the voice audio being passed in to the voice changer, sent to the target output device. Useful to ensure you are passing in the audio you actually want or to passthrough your audio without voice changing. Comparable to "monitor" settings in w-okada. Not applicable in RVC WebUI.
Output converted voice: Outputs the voice conversion to the target output device.
Main features RVC realtime has that w-okoda doesn't:
Loudness factor controls. W-okoda seems to always use a value of 0.
Significantly lower CPU usage at equivalent performance settings, in my experience.
Main features that w-okoda has that RVC realtime doesn't:
No system to save model presets.
Input/output gain is missing.
Input noise reduction is less robust compared to w-okoda, which offers echo reduction and multiple noise suppression techniques.
Unlike w-okoda, you cannot passthrough to the input mic, instead requiring the use of virtual audio cable to pass the cloned voice into voice calls and microphone recording programs.
In w-okoda, when the mic loudness falls below the response threshold, the tool is paused until speech is once again loud enough, saving GPU and CPU resources. RVC realtime always passes audio whenever it is running.
Unlike w-okoda, you cannot monitor the cloned voice while outputting it. You can work around this by using the "listen" feature in the Windows sounds panel on a virtual audio cable instead.
No built-in recording functionality.
Missing most of the settings in the w-okoda "advanced settings" menu.
No way to choose which GPU to run the voice model on. You can get around this by setting CUDA_VISIBLE_DEVICES=# in a terminal before launching the tool from there, where # is the index of your target GPU (0, 1, 2, etc.).
That's awesome! Thanks for your efforts in writing this
this deserves a pin bc ill look at it for a long time
Need to screenshot this
You're a legend o7
😅 @@theAIsearch
I ain't reading allat
you can use it in a real time environment like Zoom or Teams. You just need to delay the video the same length as what the AI voice is delayed. Use OBS for that.
OBS?
@@user-jt8xw6fd1v Open Broadcasting Software, a program you can type into youtube or google and learn whatever you need.
@@user-jt8xw6fd1v obs studio, a recording software
@@user-jt8xw6fd1v program for recording and livestreaming. You can start a virtual webcam on that program and use it how you like.
I rlly do appreciate the effort u put in ur content i rlly do
Thanks!
Can you recommend a good text to speech AI that i can change the voice using RVC?
these ai voices are scarily accurate, even the markiplier one
Im gonna be honest at the start you can tell its AI even without knowing that its AI.
Im gonna be honest at the start you can tell its AI even without knowing that its AI.
@@skye-zi2nf my grandma would easily fall for that
@@Punpas Like every normal person who don't know much about computer stuff
hey can anyone help me? , when i try to load, it s just showing terminal, with two lines , and gui is not opening
Question, how do you convert mp3 or wav files to the correct format to use other clones?
Your content can still be easily understood by a non-native English speaker. Thank u
My pleasure 😊
thanks this was so helpful.
why training our own model video is deleted ?
Hey, will this work better with an RTX2070 ? Last one was really laggy (over 20s delay IG :/)
does this scale with horsepower better? my main issue with the old software was that it was barely touching my GPU but providing a fairly lousy output at low latency. I just want to pin my hardware and have good RVC with low latency!
it mostly scales w your gpu. nvidia cuda gpus work best. other gpu models won't be used, so it'll resort to your cpu
Anyone else having issues with this particular application and working in games? It seems like it does not work in the background?
How to uninstall RVC or Okada software if we want to update? is it okay to just delete the folder?
yes
Thank you for your efforts❤
Could you clarify if the real-time live voice changer can be connected to platforms like TikTok Live or Messenger calls, apart from Discord?"
Yes you can!
Kinda waiting and hoping for ElvenLabs to release a voice changer one day.
I tried on both Linux and Windows, both of the latest releases are not working on ether OS. Ether ive missed something, or there is something i need to do, but don't understand.
Hey man, I have a problem about the w-okada voice changer, so my interface does not show my actual GPU, it only gives me options to choose from GPU0 and GPU1, what’s the problem here?
ik it’s late but you can check ur task manager to see what GPU0 and GPU1 is… for example my integrated intel gpu is labelled GPU0 and my external gpu is labelled GPU1
thanks for the video ,i have been using it for 2 weeks now ,but this voice changer has some problems of pronouncing certain sounds , starting from " u and o " sounds , it just can't pronounce certain words , do you have any settings that can fix these issues?
When i pronounce : " wukong " ,the vc just pronounce something else like : "blerong " , and when i try to pronounce " form " , it pronounced as " ferm " , this voice changer just cant pronounce certain sounds , i don't understand why , even i used multiple different voice models , still have the same issues.
Yeah, i dont have examples off the top of my head but, too many words i say, turn into completely differemt words.
Does this create a virtual microphone driver to use with other apps like vrchat?
it says 'runtime\python.exe' is not recognized as an internal or external command,
operable program or batch file.
it just can't work (No response / stopped working when I run it)
any ideas of what could I have been possibly done wrong?
The code can't even run, it just show the input and output device and then 'cuda_is_available: True'
Then no more
What Graphic card is recommended for Rvc voice changer that wont be choppy.
So...I do have an rtx4070 and it keeps crashing. Any advice for it?
Does anyone know if there is a way to get this running with runpod for a better GPU
it say AttributeError: 'RVC' object has no attribute 'tgt_sr' and crashes
My built-in mic never seem to work in these things. Do I need an external soundcard?
you need a external microphone
when the cmd window opens the interface dosent open.... help me TwT
everytime i try and use it it says rvc gui not responding can anyone help?
Idk why but when ever i start the voice conversation it keeps saying the app is not responding
Can I create cartoon shorts if I wanted to record the voice without my own voice echoing in the background?
yes. you can also do it in non-realtime if you just need the recording: czcams.com/video/ixB9oalT3cQ/video.html
Will this work better on AMD video cards compared to W-Okada?
I don't have one, so I'm not sure. Let's see if anyone else can share their experience here
AMD cards have problems for some reason. When working with neural networks, some special settings or libraries are required. Not always, but often.
what is the best ai voice changer for amd gpus ?
W-okada works good with AMD gpus, BUT it does take some EXTRA messing around, like having the RVC files on the desktop, having to switch to cpu to convert your voices to ONYX, you have to do a lot of app resets at a few stages in the conversion just to clear the cache too (have to do this process for every voice conversion to ONYX). Found the delay could be changed from 380ms to the 100ms afterwards on my 7900xtx and 7950x3d, still learning and tweaking though to get it faster
@@MrGuitarguy16 have you found the best settings for it? using my AMD and honestly it isnt too bad but I'd honestly like it way better without all the lag.
3:24 where i can get this files?
What about the VGA rx580, is that possible?
AHEM, if anyone is struggling to get it running or crashing, if your top INPUT device has (MME), go to your output and look for your sound system that you use to hear things that also has (MME), since they need to match, which is important because if you download it, it doesn't come packaged with matching MME
oh my god thank you i was having that exact issue
ur such a W for this
Thanks a lot
I was just going to delete it and found your comment
You are a god to me thanks a lot
guys how to use this voice changer in obs?
I am trying this one because my RTX 2060 is struggling a bit with W-Okada and it cuts out a letter or 2 from some words i Say
you need to upgrade your gpu. Gpus with more cuda cores. I've seen people with 1080 ti and 2070 super doing ok with 3500/2500 cuda cores. Even a 3060/3060 ti is good enough.
update! It works very well with my new settings and updated driver also it is just the microphone and model that was bad@@Kpopboppin-bw8iv
Can you show how this works for macbook?
Is there any RVC for mac ?
*Bro really finding excellent excuse's to use gura's voice and im here for it* 👏🐴
😃
HELP where can i get the weights? ready to pay =P
i need help, i have all my models installed, and have my microphone and headset plugged in and put the correct ones into the settings. but once i start the audio conversion it crashes. Can anyone help?
crashed with (python noot responding)
AMD/intel isn't working..
why there no just one installer with all requirements? it would make life easier. I takes so much time and power to install this.
Ain't there any applications for mobile to do this ?
How do we add cultural accents? asian/mexican/etc?
I have issues with the output because it looks like it works but i dont hear the result of the voice changer
i test it on my 1050 TI and it works fine :D
1050ti gang rise up 😂
How do you deal with background noise/humming?
Yeah, was hot so my fan was blowing on me and sometimes the "Pa pa pa pa pa" would turn into words lol
Bro why the gui is not opening after showing terminal
Hey do you by any chance know any software that changes voice not in real time, but for pre recorded audiofiles?
Yeah def need something like that but better than eleven labs for sure
Found it yet? If not I know something you can use
@@shonuffOTGoh dude pls tell me
@@dharianimator not sure what happened to my comment but the software is rvc gui and uvr 5 if you need a vocal remover
@@shonuffOTG cheers bruv 🍻appreciate it a lot
How did you switch to only output on CZcams? I mean if I want others to hear only the converted voice.
You have to install cable input driver.
@@loka2479 I eventual figured it out. Thanks though.
with is better in real time?
It says GUI not working
what if i have uhd?
how to install using python?
HOW TO UPLOAD A FILE AND CHANGE ITS VOICE]
For me its showing path not recognised
i am confused on how to even install what i need
failed to load asio driver
You earned a sub man! :)
Awesome, thank you!
yoo can you drop the frieren voice model my dude?
Is nvidia 1660 ti, 6gb vram enough to get a decent output? should i consider renting gpus in the cloud
no, 8GB is min requirement
@@theAIsearch same with the W-Okada?
can this work for discord?
RTX3060Ti, app freezes on starting
btw is there a way for me to talk in discord without hearing my own voice bec my friend said he can still here my own voice
You have to put the voice changer mic onto discord in settings bc rn its probably still using your regular mic
This just straight up doesnt work anymore. Many people in the github cant get it working theoryizing that its the names of the inputs and outputs (cant have spaces)
difference between w-okada?
every time i click on start. its going to not responding. what should i do? did i miss something?
having this issue too
same
Same to me, clicked start then nothing happens
Idk if I’m doing something wrong but when I try to use voice changers I can’t hear it so I don’t know if it’s working or not
I have a decent PC btw
probably your input/output settings. can you check if they are connected properly?
on cmd its only says audio block passed. also how to enable cuda?
what gpu do you have? it should auto detect it, if it's a cuda nvidia one
@@theAIsearch RTX2060 Super
@@theAIsearch RTX 2060 Super
Yooo mine freezes when i start the audio conversation
same here. It was working great before my last startup
also there's a comment below that said: "if anyone is struggling to get it running or crashing, if your top INPUT device has (MME), go to your output and look for your sound system that you use to hear things that also has (MME), since they need to match, which is important because if you download it, it doesn't come packaged with matching MME"
it works again for me
@@mthanh893mine was just in an invisible folder😅 that was the problem apparently
I have -60 DZB on a free voice changer and cant change it to get better i think i need a better PC to get a voice changer
Does this work offline or is it server
It's offline - at the beginning, you have to select the package to download that fits YOUR graphics card, so it'll run on YOUR machine. Have fun!
W-Okada's work ?
Bro Mack tutorial on Tex to image anime style 😊
see this czcams.com/video/a33B9DLOJw0/video.html
I keep getting this error message:
raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening Stream: Invalid device [PaErrorCode -9996]
It just crashes whenever I start the audio conversion, same goes to W-Okada😢 Is there a way to fix this?
looks like something to do with your input/output settings. its hard to troubleshoot though
holy shit i was watching ironmouse right before this, that scared the shit out of me.
Is this for low end pc ? And...can we use it on mobile device ?
low end pc still works, but there will be a lot of delay. mobile won't work
@@theAIsearch Hmm what about the RTX 3060 and the RTX 4070 specs ? Will there be any delay too ?
idk why my voice is hella laggy but im using a 2070super. :(
when i press "Start Audio conversion" the program just crashes
seems like a hardware issue. what's your GPU or CPU?
@@theAIsearchmy gpu is a rtx 2090, It's fine now as i got w okada to work
@@theAIsearch I also have a problem like that after I click "start audio conversion" suddenly the GUI is not responding, please help me, I've looked everywhere for a solution but it doesn't work :(
can I play a game while using it on 4080 super?
You probably could, but unless the 4g decoding and Resizable BAR options are turned on don't be surprised if you run into OOM (out of memory) errors.
I want it to run smoothly while playing a game... I'm gonna wait for the rtx 5000 series I guess..
@@rommix0
Do I need a good GPU for this one?
What nvidia gpu you are using?
Is this program has any limits for converting?
Uh how do i install this? whats pyhon?
python is python lmao
Im uaing amd what should i do i did everything i can its always choppy
your graphics card is too bad
@@Alexthereek-ob7vt oh well i guess its useless
omg very good tool for conent creator
What do you think about the voice AI app? It has a better ui in my opinion, and has a built in virtual mic
can i use this with my nvidia gt 740?
Probably not 😅
@@dogosrobi7235 740 in 2024 bro is cooked
@@yukiruu00what about a rx580 4gb?
How do you make thumbnails please some tips where you find that types of pictures from for the game REVEAL your SECRETS! 😂
I use canva or photopea. Images are just found using google search. hope that helps!
@@theAIsearch Thank you
I am on mobile,does it work on mobile?
bro you require a pc for this
So, forget about Intel/AMD processor or on-board graphics card, you need a screamer.
Can all language ?
Yeah
Does this work on Mac mini m2
im not sure. i heard that m2 should work
@@theAIsearch btw you have some of the best AI tutorials they are actually taught in a way to educate viewers a lot of these channels are confusing on purpose or literally repeats of other channels
Does it work on mac?
its only crashing when i test it
same :/
What is the "sweet spot" video graphics card nowadays?
3090
nvidia rtx cuda, preferably 3xxx or 4xxxx
@@onioncuttingninjaThanks!
@@theAIsearch Thanks!
@@theAIsearchIs an RTX 3050 8 GB sufficient for most AI tasks?
Will it work for amd processor
yes
Bruh, this is easy tools. Using AI goddamn different layouts!
I want it!
why i got index error?
When's AMD gonna let us do this 🗿
Never they dont care about AI