My Top 5 Open Source Text to Speech Softwares Starting off in 2024

Jarods Journey

zhlédnutí 49 089

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 23. 07. 2024
Links referenced in the video:
StyleTTS - github.com/yl4579/StyleTTS2?t...
Eleven's Style TTS - github.com/IIEleven11/StyleTT...
Coqui TTS - github.com/coqui-ai/TTS
Daswers XTTS GUI - github.com/daswer123/xtts-fin...
Suno Bark - github.com/suno-ai/bark
VallE-X - github.com/Plachtaa/VALL-E-X
Tortoise TTS Installation - • Local AI Voice Cloning...
Hardware for my PC:
Graphics Card - amzn.to/3pcREux
CPU - amzn.to/43O66Ir
Cooler - amzn.to/3p98TwX
RAM - amzn.to/3NBAsIq
SSD Storage - amzn.to/42NgMFR
Power Supply (PSU) - amzn.to/430bIhy
PC Case - amzn.to/447499T
Mother Board - amzn.to/3CziMXI
Alternative prebuilds to my PC:
Corsair Vengeance i7400 - amzn.to/3p64r22
MSI MPG Velox - amzn.to/42MnJHl
Cheapest and PC recommended:
Cyberpower 3060 - amzn.to/3XjtZoP
Come join The Learning Journey!
Discord - / discord
Github - github.com/JarodMica
TikTok - / jarodsjourney
If you found anything helpful, please consider supporting me and the content I am trying to produce!
www.buymeacoffee.com/jarodsjo...
Věda a technologie

Komentáře • 164

@RobertJene Před 6 měsíci ⁺¹⁸
⌚ Timestamps
0:00 - introduction
0:14 - Suno Bark
1:22 - Valle-X
3:00 - StyleTTS2
4:07 - CoquiTTS - XTTS
5:40 - Tortoise TTS
@ahmetab06 Před 6 měsíci ⁺¹
which is the best tts ?
@kanavwastaken Před 6 měsíci
@@ahmetab06Tortoise
@RobertJene Před 6 měsíci
@@ahmetab06 watch the video
@dontrez8412 Před 2 měsíci ⁺⁷
Pretty good results. I didn't think that first Bark voice bad, though. Thanks for the comparisons.
@jonathandawson3091 Před 2 měsíci ⁺⁴
Hi can you please make a tutorial of the Audiobook Maker, or how to create such pipelines? In particular you mentioned something along the lines of "RVCS" which seemed to make a dramatic difference in the last voice that you demonstrated! How is it done?
@stefanomonziocompagnoni8302 Před 6 měsíci ⁺¹
hi Jarods!
Nice video!
I'm looking for software (better if open source) that changes a recorded audio voice.
I mean, If I record my voice, I would like to use a different voice, keeping my prosody, tone, speed, etc....just changing the timbre.
Any advice?
@chaks2432 Před 6 měsíci ⁺⁵
I built a GUI for XTTS using flask and svelte and finally got rvc running yesterday. Got inspired by your audiobok_maker, but it was missing some features I figured could be pretty useful (Like allowing users to edit text inside the GUI, add/delete/reorder lines), I'm pretty happy with the result, even if the UI looks like crap and it's still a little buggy. I also got everything to run together, so I don't need the ai-voice-cloning webUI running for it to work
@Jarods_Journey Před 6 měsíci ⁺¹
Awesome! Usually it's best to get things working first, then you can make it look pretty. Glad to hear!
@Jakwine Před 6 měsíci
You’re my new idol! What’s your GitHub?
@TheMegadeth350 Před 4 měsíci
Hey. I am working at similar project by myself and I have a couple questions. Could you please give me some contact to yourself??
@k9clubme Před 6 měsíci
Thank you very much for sharing your knowledge with us. Is there a way that we can modify a lyric of a song and then make it sound as if the artist is singing the revised lyric?
@poco7193 Před 6 měsíci
With tortoise TTS I have been issues with training it. I will upload my audio for training and go through the first two steps smoothly, when I actually try to run the training it freezes with some text then just never unfreezes no matter how long I wait. Also I was wanting to know what the 2nd software you were using in this video to make the tortoise tts sound smoother. I am trying to make a podcast for a school project and desperately need a smooth tts for some of my characters
@sujetodelta1019 Před 4 měsíci
i have a question, if i plan to use any tts for voicecalls and use it with virtual cable any of these can help instead of download the audio files?
@robertbutcher222 Před měsícem
Sorry if this is a bad question for this sort of this video, but is there a way to use one of these in Linux Mint or Ubuntu to read selected text? I like to have selectable text read to me when I highlight text with the mouse cursor. There is a script I could make, if I find the instructions again, but the voice is very robotic. So, I was wondering if one of these could somehow be used, preferably offline.
@Seeker_Now Před měsícem
Hey man, thanks for your amazing work. I can't find any info on this-Is there a way I can integrate these trained AI voices with Balabolka? Hope you can answer this.
@Ulibert Před 6 měsíci
hey can you tell us where to get SAPI5 file of text to speech tagalog accent?
@trush1090 Před 6 měsíci
Hi Jarod can you make a video on how to resume training? Say I finished training at 50 epochs. How would I add 50 more without resetting.
Also how to eliminate static sounds from generated sounds. I trained 2 hours on 60 epochs just for it to have a static sound.
@johnyoung4409 Před 6 měsíci
Hi Jarods, Did you fine tune the tortoise tts when generating the example of your voice used in this video?
@Jarods_Journey Před 6 měsíci ⁺¹
Yup!
@opaleyeakintunde4827 Před 5 měsíci
How can i finetune or Configure , my Tortoise TTS to clone good just like yours ? Thank you
@RobertJene Před 6 měsíci
6:15 - what do you mean, pipeline from Tortoise TTS to RVC? Like you train a model in Tortoise and then use it in RVC or something?
@Jarods_Journey Před 6 měsíci
This video I think: czcams.com/video/MckT7z7W_qM/video.html
But yeah, run tortoise audio into RVC to make it sound better
@ferysery Před 4 měsíci ⁺¹
hi . where can i get my hands on ur AUDIOBOOK MAKER windows desktop app?
@johnlenoob6951 Před 5 měsíci
Hi Sir, thanks for all of your original and well done content !!! May you give some tips and tricks to become an organized guru dev as you. I m a lame at managing my python env and all ai project ;) Tried env, conda/miniconda ...
@orpit48 Před měsícem
I'm searching for a software to train my own voice models and use it tts, is there an option you could recommend?
@braivco Před 26 dny
Hey Jarod, would you consider building an XTTS > RVC pipeline app similar to what you've built with Tortoise?
@king-zu3ih Před 4 měsíci
can you suggest any ptocject can make an audio when people sing or rap a song. thank you
@xxredbollxx Před měsícem
i can't find the audio sample at 3:21 , how could download this?
@hassanawan3622 Před 4 měsíci
How can I deploy a Tortoise TTS trained model on the Flask web App ?
@idkman8520 Před 5 měsíci ⁺¹
Hi! I love your videos!!
Quick question! I made a chatbot... how do i use these voices??
@sigma_z Před 3 měsíci
You integrate it in.
@marcusunivers Před 5 měsíci
Is there also some open source text to sing vocal generator. 🤔
Something like Vocaloid, Utau, SynthV or ACEStudio where you can also add Midi information to your vocal to pitch it? ☺
@timeship Před 3 měsíci
Tanks for everything. BTW, what was that A.I. Audiobook Maker you showed in the video? I can't seem to find it anywhere ;-) THX
@Jarods_Journey Před 3 měsíci ⁺¹
Search up AI audiobook maker on CZcams, it should be on the search results :)!
@timeship Před 3 měsíci
@@Jarods_Journey, I tried, but it lists millions of A.I. voice makers, and not the software on your screen. Which company made it? Give me some clue ;-) THX
@aruncanra2084 Před 6 měsíci
Hey @Jarods_Journey How can I train Tortoise-tts in other languages? Or is there any alternative to get multilingual tts with emotions?
I'm currently trying to translate and clone the voice of a podcast that has many voice modulations and emotions.
My current flow:
Audio in language A -> transcribe -> Translate (Language B)-> TTS (with emotions) -> RVC
Any help is appreciated!
@aruncanra2084 Před 6 měsíci
Looked into Tortoise-tts threads seems I need 10k hrs of dataset.
@Jarods_Journey Před 6 měsíci
Some people have had success with just fine-tuning it with an appropriate tokenizer for the language. Look up nano nomad on CZcams!
@aruncanra2084 Před 6 měsíci
@@Jarods_Journey Thank you man👍🏻
@Because_Reasons Před 5 měsíci
What does RVC after do? Do you have a tutorial?
@me-cm8or Před 6 měsíci ⁺¹
Does all of these or one of these have like a local API that allows you to link them up with other local apps throw API calls?
@Jarods_Journey Před 6 měsíci
Any graido interface you should be able to... But the only one I know for sure is tortoise TTS with the AI voice cloning repo
@dohyunio Před 5 měsíci
Could you include your mic in your hardware list?
Great vid!
@svenbjorn9700 Před 4 měsíci ⁺¹
Where can we get the Audiobook Maker app? It’s not linked in the description.
@ferysery Před 4 měsíci
huggingface
@blackswan6386 Před 2 měsíci
pro why skip the installation part, how i can get this run ? it says i need python ? would be cool if you can some help
@KJ7JHN Před 3 dny
many of these voices are fantastic!
@DavidSeguraIA Před 6 měsíci ⁺¹
Thanks so which is the best open source for Spanish tts or voice cloning?
@Jarods_Journey Před 6 měsíci ⁺¹
Xtts is your best bet. Just has some licensing things you'd need to look at
@khajask8113 Před měsícem
Which one is best for clone my own voice..?
@tylerchambliss8379 Před 6 měsíci ⁺¹
Hey Jarrod. It's me Tyler again, and I'm still having issues training models on my machine with Tortoise. I've set the batch size and gradient accumulation as low as I can and it's still not training. It just gets to the loading auto regressive model and doesn't go any further. Might some of these other TTS models be easier for me to use instead? I'm just about ready to give up on Tortoise because it's been almost 2 months and I still can't figure it out.
@Jarods_Journey Před 6 měsíci
I see your post on GitHub, I'll have to get back to you on this tomorrow!
@mohsenghafari7652 Před měsícem
hi
coquiAI library support Persian language ?
thanks
@ea02ca6f Před 5 měsíci
Why not order the links in the description in the same order they are mentioned in the video with missing links added?
@Nightcortex Před měsícem
How can I get pre trained models?
@zonas7915 Před 6 měsíci
A video on how to train a model would be great, like the best settings etc
@Jarods_Journey Před 6 měsíci
You might wanna check out my tortoise playlists for tortoise TTS!
@SosyalMedyaArge-so5bs Před 6 měsíci
Thanks buddy!
@TerrennonPriv Před 6 měsíci
By the way, thanks you Jarod, update on my side, for my lore my language project. xtts was the way to go and I'm happy with the results.
@Jarods_Journey Před 6 měsíci
Glad to hear it :)!
@Edward_ZS Před 3 měsíci ⁺¹
What option runs the fastest
And do any of these work without a GPU
@Jarods_Journey Před 3 měsíci
In this example, style TTS is the fastest. They do work with CPU, it's just much too slow to utilize ATM.
@lukerbs Před 4 měsíci ⁺¹
Nice job! Thanks for the vid
@AntiAnti Před 6 měsíci
Is there any ready-to-use via local http server? I mean, I want to send json http requests from another app and receive audio data.
I know it should be very easy to do, but python isn' t my thing.
@Jarods_Journey Před 6 měsíci ⁺¹
You can use the gradio interfaces for this. Tortoise launched on localhost:7860 and so you can interact with it using the Gradio API which you'd find at the bottom of the Gradio page.
@AntiAnti Před 6 měsíci
@@Jarods_Journey Found it. Thanks.
@lismoiunehistoire Před 4 měsíci
are these free for commercial use?
@dthSinthoras Před 6 měsíci
Which ones can handly german well? Tortoise was failing very hard when leaving englisch...
@Jarods_Journey Před 6 měsíci ⁺¹
I think xtts is the only multilingual one. Bark is as well, but it's quality is not there
@dthSinthoras Před 6 měsíci
Thank you!
@user-pk4hn1uz1k Před 3 měsíci
Is there any tool that doesn't require downloading a 70gb model
@ALAN-lv1zj Před 4 měsíci
Are all this free for commercial use
@ruudygh Před 4 měsíci
What is that Audiobook? how does it magically makes a bad audio to become a good audio?
@HOWDO7 Před měsícem
Is there any software or TTS tool that has caribbean accents?
@Powerlevelover9000 Před 6 měsíci
Does anyone here know a good database for tts voice models ? RVC has lot of voice models available but I can't seen to find a good database be it a website , discord etc for tortoise tts.
@Jarods_Journey Před 6 měsíci ⁺¹
Tortoise TTS, not too widely adopted unfortunately. Haven't seen anything pop up
@Mowgi Před 6 měsíci ⁺³
Tortoise wins out, but I'm very interested in seeing more from Eleven's. From these examples, Coqui definitely seems to get the closest to your voice out the box, but the actual quality of the audio sounds very low. Is there a way to set the bitrate?
@PROJECTSSourceEngineLessons Před 6 měsíci
if we talk about XTTS, the initial quality is 22 hz, but there is a resample function at 44 hz, of course, artifacts may appear
@blender_wiki Před 6 měsíci ⁺¹
not about bit rate, you get the file in WAV, the issues is that XTTS model is working with a temporal resolution of 22.05Khz so you must have a very good recording and EQ your voice sample taking this into consideration. If you have certain harmonics that are "flanging" .
The voice at the beginning of the video is generated with XTTS: czcams.com/users/shorts8YyHxD42k-A
Still not perfect but better than the example shown here just because the sample provided to the model is recorded ans prepared in a better way.
We are trying to train a fresh model with 44.1Khz data set
@Jarods_Journey Před 6 měsíci
I think the commenters gotcha answered here, but not to my knowledge. That's why running through RVC is kinda an "audio upscaler" as i am using the 48k models in RVC.
@Arveee Před měsícem
Thank you!
@sergialbert97 Před 6 měsíci
Jarods just for my benefit, when you apply RVC to some of these. First you do a voice cloning with for example XTTS, and then apply RVC or directly u use one default voice that has a similar tone and apply the RVC. Or maybe ir better apply finetuning and then RVC. Thanks for your videos mate!
@Jarods_Journey Před 6 měsíci ⁺¹
The workflow is: train a finetuned model in xtts and then train a finetuned in RVC.
You then use the audio from xtts and run it through the RVC trained voice model.
@justindaniels5923 Před 2 měsíci
@@Jarods_Journey Do you have a guide/tutorial for setting this up? Just starting diving deeper into this stuff, but I've been loving your content. Thanks man!
@vikramr60 Před 6 měsíci ⁺²⁸
Coqui TTS is not open source, means it can't be used for commercial purposes,only for research and educational
@motionmix2523 Před 4 měsíci
It says commercial use now.
@lukasnesvarbu1485 Před 4 měsíci ⁺¹
@@motionmix2523 i think its because it died
@maikelm20 Před 3 měsíci
Coqui XTTS is. Not for commercial use.
It does have others models which are open source
@willmedrano98 Před měsícem ⁺¹
Not sure if CoquiTTS is open source or not, but open source does not mean that you can use for commercial purposes.
@DihelsonMendonca Před 2 dny
There are two different categories, tts-1 which is free and hd voices, which are not for commercial use.
@YannMetalhead Před 3 měsíci
Good video!
@ElmorenohWTF Před 6 měsíci
Please make a tutorial on how to train using Google colab to the AI that you think gives the best multilanguage result
@spiritual_audiobooks Před 5 měsíci
A Open Source local, fast neural text to speech system that sounds great is Piper TTS.
@luigivitofrancesco6221 Před 6 měsíci
Which is the fastest to install? like wokada, so just one click to install everything
@Jarods_Journey Před 6 měsíci ⁺¹
Probably tortoise TTS as I have an installable package for that one on CZcams
@SyamsQbattar Před 10 dny
Does those LOCALs AI Voices support Indonesian language?
@Bigjuergo Před 6 měsíci
Can you explained Nr 5 in more Detai please?
How does the audiobookmaker work?
@Jarods_Journey Před 6 měsíci
Probably wanna check out this video! czcams.com/video/xbheTi1YjnM/video.html
@christophermoreira6198 Před měsícem
What about PiperTTS?
@yusufcan1304 Před 4 měsíci
thanks man
@EmpowerMuse Před 6 měsíci
Have you tried GPT SoVITS tts?
@Jarods_Journey Před 6 měsíci
Still trying it out, but the audio isn't too bad. It's not completely finished and the process is difficult to follow as there are areas I'm running into difficulties with, so I'm still waiting a bit on it.
@makiroll6815 Před 6 měsíci
Which one would you use for Vivy for speed?
@Jarods_Journey Před 6 měsíci ⁺¹
Tortoise TTS. It's still my go-to and can process voices fast enough with deepspeed+hifigan
@makiroll6815 Před 6 měsíci
@@Jarods_Journey cool thanks
@aruncanra2084 Před 6 měsíci
What are the languages supported by tottoise TTS?
@poly06033 Před 6 měsíci
only english
@aruncanra2084 Před 6 měsíci
@@poly06033 Are there any other TTS that support Hindi or Urdu and as good as Tortoise?
@nielsieboy19 Před 6 měsíci
You can train your own language, I have a dutch model on hugginface for example
@breakmillions2347 Před 3 měsíci
not gonna show how to run it locally?
@aiart21 Před 4 měsíci
Can i ask that what TTS local ai is useful making TTS mp3 file with very big book like mody dick or bible? i have RTX 4090. i wanna TTS video with mody dick and bible. Thank you for your great video.
@aiart21 Před 4 měsíci
at once, one click. no stop.
@nielsieboy19 Před 6 měsíci ⁺²
From what I've seen StyleTTS does a much better job of cloning a voice, it's also an order of magnitude faster than Tortoise. Only thing holding it back are the absolutely mental VRAM requirements for training and multilingual models (which are being worked on by the community).
@Jarods_Journey Před 6 měsíci
The samples I've heard have been pretty awesome and I agree on the speed as well
@blender_wiki Před 6 měsíci
a short comment just to wish happy new year and improve your YT engagement score.
I think you must check your recording workflow and final quality sample because all your voices sound too robotic compared to to what i am used with this tools. Maybe you record with the mouth to close to the microphone and the low frequency of you voices are too present and this models dont like a too much low frequencies, maybe is just enough doing an EQ with -3db under 80Hz
@Jarods_Journey Před 6 měsíci
Happy New years too! It's funny you mention that because the dataset on my voice is a pretty crappy one. I just grabbed it from a CZcams video and the eq I do on my CZcams videos are generally bass enhancing 😅. Id say models trained on my voice are not as good as others that I've done with other voices
@everybodyguitar5271 Před 3 měsíci
Bark is really slow when doing training in Mac.
@billyindrajaya Před 6 měsíci
Hi jarod .. for your subscribers why you didnt give us link google colabs?
@Jarods_Journey Před 6 měsíci
Any of the repos that have collabs will be on their githubs
@RobertJene Před 6 měsíci
you forgot to put the link for Tortoise TTS in the descriptables
@Jarods_Journey Před 6 měsíci ⁺¹
Oops 🤫😅
@CooloSolo Před 6 měsíci
No 1 suno is awesome
@0chiel Před 6 měsíci ⁺¹
Dumb questions(novice):
-Are these free to use commercially?
-Can a standard m1/m2 level mac run them locally
Thank you
@psalmy26 Před 6 měsíci ⁺¹
Find their repos and look at their licenses. Anything with a MIT license is, other things get a bit more nuanced.
@Jarods_Journey Před 6 měsíci ⁺¹
I'm not a lawyer, but the ones with MIT licensing are Tortoise TTS, bark, and valle-x. Xtts has a non commercial licence for their free stuff and styletts has a unique one where I think you have to have a disclaimer about it... That is unless you train up your own base models.
@nufh Před 6 měsíci
How about emotion, like laughing, giggling, mad etc.?
@Jarods_Journey Před 6 měsíci
The only one that I've seen that's done this has been bark... But nothing new I'm afraid
@nufh Před 6 měsíci
@@Jarods_Journey That is what I want the most actually. Having the expression will make it more alive.
@enriquemontero74 Před 6 měsíci
I also need to add open voice here
@qodeninja Před 4 měsíci
Do you have videos on setting this up? oh yes look at that you do
@RobertJene Před 6 měsíci
intro be like "why are you bri-ish"
@keithmorse9716 Před 4 měsíci ⁺¹
It seems like you have this targeted for probably a broader audience than I am. attached to. because I'm dyslexic, so I'm trying to find something to help me keep interested in material while having. difficulties being able to consistently read materials.
@phil5583 Před 6 měsíci
Tortoise sounds so good! I really hope there will be a multilingual model for it in the future.
@Jarods_Journey Před 6 měsíci
Xtts is multilingual! Using it with RVC you could get similar results I believe
@ahmetttt10 Před 6 měsíci
can u try@@Jarods_Journey
@subured Před 5 měsíci
Does it supports Tamil language?@@Jarods_Journey
@Disastorm Před 6 měsíci ⁺¹
If anyone wants to hear more examples of StyleTTS2, the AI character in my most recent video ( Mirai: Ai Plays Streets Of Rage ) uses it to talk ( pretrained checkpoint, no finetuning, using zero-shot cloning of a fake voice, goal isn't to clone an existing voice but make a good sounding ai. To be clear it still ends up pretty monotone though, but realtime performance is near instant. ).
@mikeg9b Před 2 měsíci
It's 2024 and I still use espeak.
@videozontherapy1440 Před 6 měsíci
What is the current top live ai voice changer?
@danzai Před měsícem
Nice, but i'm not interested in cloning.. just a realistic voice. Would be nice if you made a tutorial on how people can just set these up
@trilogen Před 11 dny
I don't like how the word Open Source is used loosely. Open Source means it can be used for commercial purposes and none of these give you that type of license.
@gabrielv.4358 Před měsícem
Very good recommendations, but they need to be installed, which I dont like... And you probably need a $1000 GPU also.
@arekopo Před 6 měsíci
✨🤸⭐⭐⭐
@LucidFirAI Před 6 měsíci ⁺¹
I would love for you to update czcams.com/video/zRjLFFU3INg/video.html text to sing...
@noonesbiznass5389 Před 6 měsíci
It's too bad Bark doesn't seem to be maintained or improved in the last year or so. It's the only one that has any form of truly convincing inflections, granted at a very poor quality and lots of hallucinating.
@THEKL7773 Před dnem
they all feel like lazy code people trought out there, none of them even have a proper Ui or can even be considerd a proper program. You have to literallty do all the work rather than just getting something you can just trough text and have it work, imagine actually working on a full book with this itd be a nightmare. Why are you all just cool with this level of shit.
@DihelsonMendonca Před 2 dny
💥 You chose some really weird voices, bro. Looks like a horror movie. These japanese voices suck. Some of these. I use coqui and it has fantastic voices. Also, you didn't even mention the best one: ELEVEN LABS. Unmatched ! 🙏👍💥
@viviviontheway Před 5 měsíci
well, i think they all sound bad ? we still need to wait a bit
@PLAGUEDOCTOR2006 Před 3 měsíci
how do i have 2 views and 129 comments i think you guys are bots
@spacemule1 Před 3 měsíci
suno straight gross
@PLAGUEDOCTOR2006 Před 3 měsíci
are these comments even real?
@finalblast3825 Před 3 měsíci ⁺¹
most of them are bots, and the same can be said for twatter. From crypto shilling to onlyfans and tons of other crap as per the design
@expandablevictor7858 Před 3 měsíci
Suno Bark is a scam, try it out, it is miles away from the advertisement. It doesn't laugh like that by the way.
@JohnMcclaned Před 6 měsíci ⁺²
'softwares' isn't a thing. 'software' is plural
@noonesbiznass5389 Před 6 měsíci
Dude, he's not a native US speaker... chill... what a stupid thing to bother commenting on.... sigh....
@Jarods_Journey Před 6 měsíci
🤯
@williamwallace9826 Před měsícem
Why did you bother making this video? It is NOT helpful. Text-to-speech is about reading typed text, it is NOT about recording and synthesizing your voice.

Další v pořadí

Automatické přehrávání

TEXT TO SPEECH | Piper TTS on Windows 🚀 AI voice 10x faster Realtime!