Open Source Multimodal LLM for Speech - SpeechGPT

Sdílet
Vložit
  • čas přidán 6. 09. 2024
  • SpeechGPT - github.com/0nu...
    Examples - 0nutation.gith...
    Hardware for my PC:
    Graphics Card - amzn.to/3pcREux
    CPU - amzn.to/43O66Ir
    Cooler - amzn.to/3p98TwX
    RAM - amzn.to/3NBAsIq
    SSD Storage - amzn.to/42NgMFR
    Power Supply (PSU) - amzn.to/430bIhy
    PC Case - amzn.to/447499T
    Mother Board - amzn.to/3CziMXI
    Alternative prebuilds to my PC:
    Corsair Vengeance i7400 - amzn.to/3p64r22
    MSI MPG Velox - amzn.to/42MnJHl
    Cheapest and PC recommended:
    Cyberpower 3060 - amzn.to/3XjtZoP
    Come join The Learning Journey!
    Discord - / discord
    Github - github.com/Jar...
    TikTok - / jarodsjourney
    If you found anything helpful, please consider supporting me and the content I am trying to produce!
    www.buymeacoff...

Komentáře • 21

  • @bomar920
    @bomar920 Před 6 měsíci +6

    We are eagerly waiting how you get your dataset prepare in your previous video

  • @miladmohseni187
    @miladmohseni187 Před 6 měsíci +1

    Thank you teacher for the excellent educational videos
    Please tell me the name of the most powerful voice changing artificial intelligence that you have tested so far 🙏

  • @Amandeep-yq7ew
    @Amandeep-yq7ew Před 6 měsíci +1

    Can you make a vedio on the installation process

  • @seifuishiguro
    @seifuishiguro Před 6 měsíci

    Hey Jarrod. I've recently come across your channel and I learned quite a bit, I love this content. I would like to try out Tortoise + RVC sometime in the near future when I can afford a GPU.
    At the moment I am trying out eleven labs, their v2 model is pretty dang good, but I can't get it to clone special voices like Luffy and Usopp from One Piece (dub), even with high quality recordings from sound resource. Usopp is somewhat close but Luffy is far off.
    Anyway I'm very curious to see if Tortoise+RVC can do a better job. I saw your short where you compared the two models with Melina's voice, but that was quite a few months ago, any chance you can compare them again soon?

  • @thekinoreview1515
    @thekinoreview1515 Před 6 měsíci

    Jarod, have you seen/tried aero (slp-rl/aero on gh) at all? I am impressed with it for audio super resolution. It did a good job with dialogue from 12khz -> 48khz for me. I think it could be used in a TTS/voice conversion pipeline as either dataset pre-processing (to get 48k samples for RVC, which are hard to come by) or applied to output so you can train lower on lower quality data but end up with high quality results.

    • @Jarods_Journey
      @Jarods_Journey  Před 6 měsíci

      I haven't seen that one, but I believe voicefixer is an option on tortoise (not sure if it's active or deactivated) which is an "upscaler" for audio as well. I might have to check

    • @thekinoreview1515
      @thekinoreview1515 Před 6 měsíci

      @@Jarods_Journey Thanks, I will check out voicefixer also.

  • @gotrixf3088
    @gotrixf3088 Před 6 měsíci

    Hello bro, watching your videos, I was very interested in learning more about programming and understanding how to build systems like yours. Could you give a quick guide on where to start?

    • @Jarods_Journey
      @Jarods_Journey  Před 6 měsíci

      You can watch some beginner Python courses on YT, but I mainly developed my skills through doing projects. When ChatGPT came along, it sped up the process even more.
      My suggestion is browse around on CZcams and find relevant tutorials or guides on topics your interested in, and then start tinkering with it yourself

  • @WorldLie
    @WorldLie Před 6 měsíci

    Jarod can u please do a video on how to continue training tortoise tts if theres a blackout during the training. i would really appreciate if make a quick video on it.

    • @Jarods_Journey
      @Jarods_Journey  Před 6 měsíci

      You might wanna check the latest Japanese tortoise video, I show how to continue training in that one

  • @tylerchambliss8379
    @tylerchambliss8379 Před 6 měsíci

    Hey Jarrod, so I'm wondering what I'm doing wrong, my tortoise models are repeating but the audio sounds fine as far as output goes. I manually split my dataset with my audio editor instead of letting Whisper do it because of it leaving breaths and noise in the end of my clips and while that has helped get rid of the artifacts and improve my audio it's still skipping stuff, repeating, and making jibberish on occasion. I have text LR and mel ratios all the way up, learning rate at 0.01 and between 10 and 50 epocs depending on dataset length, 5 minutes to about an hour. My losses were around 0.6 something at the lowest and 1.3 something at the highest. Pause and repeat penalty are both set to 8 on inference.

    • @Jarods_Journey
      @Jarods_Journey  Před 6 měsíci

      I think your LR is a little too high, I would try with a lower LR, closer to 0.0001 or 0.00001, but repeating words and artifacts are a known issue on tortoise. Better datasets and longer training might help to mitigate this, but it really depends seemingly on the voice.

    • @tylerchambliss8379
      @tylerchambliss8379 Před 6 měsíci

      @@Jarods_Journey So now Tortoise is complaining about my text length being too long. It was only about 6000 characters and I've put 22000 some odd characters through it on the default autoregressive model and it went fine. What's going on here?

  • @Hury209
    @Hury209 Před 6 měsíci

    Do someone knows easiest way to install mistral llm to chat with pdf on windows with possibly webui?

  • @vinchenzovarela8039
    @vinchenzovarela8039 Před 6 měsíci

    I'm currently learning some languages and it would be very interesting to see this type of models be implement in some sort of tutoring app, I'm not sure if it has the capability of differentiating languages in the same .wav file dou

    • @vinchenzovarela8039
      @vinchenzovarela8039 Před 6 měsíci

      You are awesome bro, keep up the good work. I'm using your tortoise tts depo for a project now

    • @Jarods_Journey
      @Jarods_Journey  Před 6 měsíci

      Not as of now, it seems to be English only. Bur maybe in the future, definitely

  • @yuyutsurao
    @yuyutsurao Před 6 měsíci

    How i can contact u ?

  • @GraveUypo
    @GraveUypo Před 6 měsíci

    That's cool but i need it to be better than this. Also it might sound better with an RVC pass 😬