0:00 - Demo (What We're Building) 1:10 - High Level Walkthrough / Discussion 5:02 - Gradio User Interface (Microphone Recording) 9:07 - OpenAI Whisper API (Speech to Text) 11:34 - ChatGPT API (Chat Completion) 21:00 - Making OSX Talk 22:06 - Jay-Z Edition (Rapping Therapist) If you've been enjoying the AI content, I am starting a spinoff channel this year focused on AI in music, gaming, and design at youtube.com/@parttimeai Source Code: github.com/hackingthemarkets/chatgpt-api-whisper-api-voice-assistant Twitter: twitter.com/parttimelarry Buy Me a Drank: www.buymeacoffee.com/parttimelarry
Here I am. Watching this video at 3 am. Mind blown. Will build my Jarvis. Push the responses in a database. Make them searchable. Langchain. Too much coffee. I need sleep. Thanks Larry.
Just wanted to say how much I appreciate this channel. It is the first time I find a no-bullshit channel, that goes straight to the point and gives you the step-by-step recipe for doing exactly what the title advertises. You are amazing! Keep doing the good work! Thank you!
Woah, when I checked my emails this morning, I was stoked to see the OpenAI notification! I was so pumped to get started that I made a similar use case over breakfast. It's awesome! Could you do me a solid and make a video tutorial on voice cloning? It would be super cool if you could show it off using your own voice. That'd be rad! Anyways, thanks for the video and congrats on your awesome start! For a while I thought Larry is in real trouble :D
Thanks for the demo! Made myself a french tutor this evening. Doesn’t work as real time as this vid (30 second delay before response for some reason). But it is working. For anyone on windows, you can use the gTTS library to generate the voice, rather than the ‘say’ function which is limited to Mac.
Can you have a headphone that does noise cancellation and translates what you are saying in real time to another language. Then the other person can respond and you headphone does the same thing. Basically like watching a dubbed movie. Each person can take a headphone and put it in their ear and start taking to each other in any language and be able to understand each other.
I have been thinking about this for years. It would not only be “fun” it would help break down power structures in society. Language can be massive barriers for people. I can’t wait for it to become reality.
I’ve looking for something that does this for years but a legit polished real time me product doesn’t exist yet. It’s a billion dollar idea for whoever accomplishes it first
Actually.. someone have done this few days ago.. he played Fortnite with Japanese people with a wifu voice and even understand them fully .. he basically used 3 AI api to created it ..i don't recall his channel name but it end with "weeb"
@@parttimelarry sure i replaced subprocess.call(["say", system_message['content']]) with word = system_message['content'] talk = f'(New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak("{word}");' subprocess.call(['PowerShell', 'Add-Type -AssemblyName System.Speech;', talk], shell=True) had to use variables to get around quotes inside of quotes also had to manually install ffmpeg and add the path to my system variables but then it worked pretty the same as your demo
Awesome, this looks like a a great solution. I think I have Parallels somewhere where I can try this out and I'll share this snippet on the Github project. Thank you!
@@drnotebook I couldn’t find a way through cmd. There’s probably some programs you can load similar to say that would add the functionality. But that was the only native way I could find.
RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH. still getting this error even after installing the necessary modules
Dude, this is amazing! I just had an audible outburst when it worked!!! (it took several times, so there was quite the anticipation - a few roadblocks along the way, as a non-dev the whole environment management was a steep curve for me) But thank you for putting this out there, and the idea with the therapist script was really helpful as well, understanding the concepts of embedding purpose-built personas and roles. Thanks!!
Hi, you just got a new subscriber as you have hit exactly what I had mapped out ... but you did it in an afternoon ... hats off - I look forward to checking out the rest of your content and once I had put my brain back together I noticed that you are use a very simple IDE on you Mac and would be interested to know what it is?
I have been watching your videos for quite a long time, and you are the best. No advertisements, no craps, simple and straightforward forward, and purely understandable with a step-by-step guide. I am trying to compile all those videos and build a Stock Analysis platform to slice and dice data and build my own Trading BOT, but I do not see everything in one piece. Either the visual is missing, or the backend is. How can I use all your videos to make one piece of code to do the best trading? Can you please guide me?
Thanks for this great tutorial. I wonder if gradio or some other listening tool can be triggered by a prompt word just like "Alexa" instead of clicking the buttons to "Stop recording", then "Submit"?
I just used the voice built into Mac OS X, but there are many great voice cloning / synthesis packages now. Maybe will discuss some of them in future videos.
Does ChatGpt assistant role consume tokens? For example if I talk with it for an hour it will cumulate all responses and send them in every query? About video -Good stuff man!
I am trying to replicate this with Google Colab and I can't see the debug output like you see in VSCode, for example, at 11:15, you were able to see the output was a JSON array with "text". When I run in Colab, it doesn't show any of that, just the pretty Gradio web UI automatically loaded. Is that something I need to turn on?
Dude. I really enjoy yr videos. But for this video I still don't get why you need to use gradio. Maybe it's just there for UI, because all other stuff is done by whisper and GPT3.5-turbo. I see in your later videos you appear to move away from gradio, but maybe mistaken. Why I like your videos is that you have done all the research of current APIs for us to use. And this research is the part that takes the most time. I thank you for saving us all this research effort 😊
I got the below error: openai.error.InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg'] do you have any idea to solve the problem
Awesome content! Looking forward to your voice cloning video! Do you think it would be possible to just have an avatar you speak to instead of clicking on submit every time you say something?
I'm wondering about that too. There's probably some way to use a push to talk setup. That would be easier than setting up a noise-activated mic, I think. Then all you need is to route the text output through a text to speech API. For a visual, I don't know. I've never used those things. But there's probably tons of options for that I have windows, so I can't use the text-to-speech shown in this video. Let me know if you figure something out
I'm going to try using the text to speech api from google. You just need to set up a google cloud account, then a service account, and download your API key json file. Then you define a variable to the path of that file on your system. So I'm a few steps closer But I'm going to have to learn more about gradio if I want it to work. Ideally I want to include a field on the gradio page where I can also input text, for technical information. And the text-to-speech should have a toggle option. I think I will want to turn it off in some cases
New subscriber here Larry. Thanks for such a good educative video. I'm still fairly new in coding but would Streamlit be another good application to create an interface?
Help with windows. Still unable to use voice on windows. Get error output. Even the part for "transcription goes here". Cannot return output when i make an input. Not entering an input returns the "transcript goes here". I believe some problem in this area is stopping me from getting further.
What platform are you running the code on ? I keep trying it on Jupyter but the "gpt-3.5-turbo" model doesn't seem to work. Instead it keeps asking to switch to "davinci" The error message suggests that you I am trying to use a chat model with the v1/completions endpoint, which is not supported. And that I should use the v1/chat/completions endpoint instead.
I think enhancing the code could be achieved by incorporating built-in text-to-speech and speech-to-text functionality, eliminating the need to manually record your input.
How can I do on a server in python to create the last part text to speech? I am using pyttsx3 and the command runandwait doesn't stop the loop of the AI. Amazing video!
I get this error. I have tried "pip install ffprobe" and "pip install ffmpeg". Still get the error, pls help! RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH.
I got the same error trying it on Windows. It just can't find the file but I don't have the patience to start debugging that. I saw that someone mentioned using pyaudio so maybe I will try that instead of struggling with gradio. Let me know if you figure it out.
Thanks for sharing. I'm new to your YT channel, and still learning python. I encountered some error installing FFMPEG. btw i used Windows10. Thank you. RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH.
Next level as usual Larry!👊I got an error that I have not seen before involving the audio file: RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH...say what?
Hi Larry, wondering if you could do a video on getting / processing different time intervals at once (say a 5 candles and daily charts). So you want your strategy to run on a 5 min chart, but you also need to get the previous daily close which isn't available in the 5 min chart dataset
All i got up to now is ChatGPT to work inside my windows cmd. Getting there, but slowly. I'm 100% in this to get it working on windows and will share the code if I do so.
@@saadehsan894 If you get it to work I'd appreciate the help. I have code written above in the comments. Got ChatGPT working inside CMD. Slowly getting there
0:00 - Demo (What We're Building)
1:10 - High Level Walkthrough / Discussion
5:02 - Gradio User Interface (Microphone Recording)
9:07 - OpenAI Whisper API (Speech to Text)
11:34 - ChatGPT API (Chat Completion)
21:00 - Making OSX Talk
22:06 - Jay-Z Edition (Rapping Therapist)
If you've been enjoying the AI content, I am starting a spinoff channel this year focused on AI in music, gaming, and design at youtube.com/@parttimeai
Source Code: github.com/hackingthemarkets/chatgpt-api-whisper-api-voice-assistant
Twitter: twitter.com/parttimelarry
Buy Me a Drank: www.buymeacoffee.com/parttimelarry
Here I am. Watching this video at 3 am. Mind blown. Will build my Jarvis. Push the responses in a database. Make them searchable. Langchain. Too much coffee. I need sleep.
Thanks Larry.
The future is here. We live in the future. My jaw in on the damn floor.
i used this tutorial to help me make it into a flask app and deployed it to a website. Thank u for making this!
Just wanted to say how much I appreciate this channel. It is the first time I find a no-bullshit channel, that goes straight to the point and gives you the step-by-step recipe for doing exactly what the title advertises. You are amazing! Keep doing the good work! Thank you!
Coffee on me Larry. You are awesome for sharing!
Thank you very much! Drinking a flat white now :)
Pretty sick! Especially the rap part! 🤣🤣🤣 Good job!
yessss larry is on
Hilarious intro Larry, Great video all around. ;)
So glad I found this channel. I have an idea for a product...this may help bring it to market.
Ty
I have been looking for this sort of example and explanation for a long time. Your simple and easy approach is fantastic; thank you.
dude, this is amazing kick for a lazy bum, old worn out like me. I am gng to try exactly this first .. 👒 off
You are on top of this game. Thank you.
upvoted for the maccas hat / binance shirt combo
This was a project i was thinking of doing. Guess i'll do the tutorial now rather than get chat GPT to tutor me through it. Thanks!
Woah, when I checked my emails this morning, I was stoked to see the OpenAI notification! I was so pumped to get started that I made a similar use case over breakfast. It's awesome! Could you do me a solid and make a video tutorial on voice cloning? It would be super cool if you could show it off using your own voice. That'd be rad! Anyways, thanks for the video and congrats on your awesome start! For a while I thought Larry is in real trouble :D
Great stuff
Brilliant Intro - That's a great hook. Nice video!
Thanks! I've been enjoying following your channel as well, I recommended your channel on my Tech I'm learning in 2023 video
@@parttimelarry Very kind of you!
Still looking forward to the front end tutorial on the financial advisor QA. The video was great. Thank you.
Amazing stuff. This is coming up so fast, Dan will soon have more lovers than Samantha had in the movie (641 if I recall correctly).
Thanks for the demo! Made myself a french tutor this evening. Doesn’t work as real time as this vid (30 second delay before response for some reason). But it is working.
For anyone on windows, you can use the gTTS library to generate the voice, rather than the ‘say’ function which is limited to Mac.
Can you have a headphone that does noise cancellation and translates what you are saying in real time to another language. Then the other person can respond and you headphone does the same thing. Basically like watching a dubbed movie. Each person can take a headphone and put it in their ear and start taking to each other in any language and be able to understand each other.
This sounds very doable and would be super fun!
I have been thinking about this for years. It would not only be “fun” it would help break down power structures in society. Language can be massive barriers for people. I can’t wait for it to become reality.
I’ve looking for something that does this for years but a legit polished real time me product doesn’t exist yet. It’s a billion dollar idea for whoever accomplishes it first
@@NickWindham anyone who understand this can create this in 20 minutes.
Actually.. someone have done this few days ago.. he played Fortnite with Japanese people with a wifu voice and even understand them fully .. he basically used 3 AI api to created it ..i don't recall his channel name but it end with "weeb"
Very Good!!!
Thanks for an awesome demo!
You are great. Thanks
You're a genius, kudos for all what you're teaching us. Thank you
Perfect timing for the release of chatgpt api! Thanks for this great video!!
Thx man this was real cool. It took me a while to use a PowerShell command on windows for the voice output, but now the fun really begins
Could you share what you did on Windows? I don't have a Windows machine set up right now, but a lot of people are asking about this.
@@parttimelarry
sure i replaced
subprocess.call(["say", system_message['content']])
with
word = system_message['content']
talk = f'(New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak("{word}");'
subprocess.call(['PowerShell', 'Add-Type -AssemblyName System.Speech;', talk], shell=True)
had to use variables to get around quotes inside of quotes
also had to manually install ffmpeg and add the path to my system variables but then it worked pretty the same as your demo
Awesome, this looks like a a great solution. I think I have Parallels somewhere where I can try this out and I'll share this snippet on the Github project. Thank you!
@@drnotebook I couldn’t find a way through cmd. There’s probably some programs you can load similar to say that would add the functionality. But that was the only native way I could find.
@@christophermorris486 Perfect, worked straight away substituting that code on Windows. Thanks mate
This will reduce overpaid therapists lol.Great vid
Yo yo yo... it's very funny listening to a TTS app talking like this 🤣
you crack me up. Thank Larry
Wow!! Congratulations!! Best video on AI ever so far!! 🎉🎉
awesome
Insane value in a short video! Thanks 👏
Super cool and funny 2. thanks for sharing👏👏
I can’t thank you enough for all the videos you make! Ps: it says buy me a “drank” in video description haha
RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH.
still getting this error even after installing the necessary modules
great video, perfect fun exemple combining those new tools.
Ol dependable Larry.
Dude, this is amazing! I just had an audible outburst when it worked!!! (it took several times, so there was quite the anticipation - a few roadblocks along the way, as a non-dev the whole environment management was a steep curve for me)
But thank you for putting this out there, and the idea with the therapist script was really helpful as well, understanding the concepts of embedding purpose-built personas and roles. Thanks!!
Hi, you just got a new subscriber as you have hit exactly what I had mapped out ... but you did it in an afternoon ... hats off - I look forward to checking out the rest of your content and once I had put my brain back together I noticed that you are use a very simple IDE on you Mac and would be interested to know what it is?
Duh... yes it's VSC - that's what you get from a lifetime of Notepad++ lol
Larry you rock 😎 cannot wait for weekend and trying this out 🎉
Larry, you're my hero
I have been watching your videos for quite a long time, and you are the best. No advertisements, no craps, simple and straightforward forward, and purely understandable with a step-by-step guide.
I am trying to compile all those videos and build a Stock Analysis platform to slice and dice data and build my own Trading BOT, but I do not see everything in one piece. Either the visual is missing, or the backend is. How can I use all your videos to make one piece of code to do the best trading? Can you please guide me?
Right on time Larry!
What a great video, this looks so much fun. Thank you
I tried having GPT-4 mock me up something in Javascript but I have basically no coding experience. It was rough. I appreciate the walkthrough.
This is pretty awesome from the second minute and deserves a like NOW :). Hopefully, I'll not change my mind by the end of the video :)
Thank you, the video is excellent.
Super helpful, thanks for sharing!
Thank you very much! Cheers.
This is incredible, thanks for sharing !
Thank you very much!! It's amazing what we learn with all your videos!!!
Larry Legend 🐐
I am super excited to try this I've had an idea and this is close to what I need to make that idea a reality.Thanks!!!
Thanks for this great tutorial. I wonder if gradio or some other listening tool can be triggered by a prompt word just like "Alexa" instead of clicking the buttons to "Stop recording", then "Submit"?
Thanks a lot Larry, high quality content as usuel. Thank you so much
So cool! Thanks!
lmao that intro and examples 😂
You are awesome! Thanks for sharing this!
Absolute genius content as usual
Boom...loved i!
The A.I and the voice recognition is good but the text-to-speech is a bit robotic
I just used the voice built into Mac OS X, but there are many great voice cloning / synthesis packages now. Maybe will discuss some of them in future videos.
That was hilarious and brilliant
this is amazing guys, Im just wondering how many new business can be builded in less than 25 minutes.
Noobie here. What software are you using at 5:40? Kudos for the great content!
Looks like visualstudio code.
hi Larry, your videos are very insightful thank you.. can you make a video of the trading bots you're currently running for your own accounts?? Thanks
wow. great video as always.
Does ChatGpt assistant role consume tokens? For example if I talk with it for an hour it will cumulate all responses and send them in every query? About video -Good stuff man!
Great job! Thanks a lot!
Excellent vid
Only if the AI could see your Mcdonalds' cap?! :))
gold intro bro 🤣🤣🤣
How about a trigger that recognizes when the user stops talking, then posts the recording.
Also, let's make chatgpt interrupt people.
I knew you gonna put out something like this today! Thank you Sir.
I am trying to replicate this with Google Colab and I can't see the debug output like you see in VSCode, for example, at 11:15, you were able to see the output was a JSON array with "text". When I run in Colab, it doesn't show any of that, just the pretty Gradio web UI automatically loaded. Is that something I need to turn on?
I figured it out. launch(debug=True)
Thank you!
Dude. I really enjoy yr videos. But for this video I still don't get why you need to use gradio. Maybe it's just there for UI, because all other stuff is done by whisper and GPT3.5-turbo. I see in your later videos you appear to move away from gradio, but maybe mistaken. Why I like your videos is that you have done all the research of current APIs for us to use. And this research is the part that takes the most time. I thank you for saving us all this research effort 😊
Bruh where’s the ui for the chatbot?? So pumped!!
I got the below error:
openai.error.InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']
do you have any idea to solve the problem
Same Problem for me, find no fix :( Nobody has an idea ??
Just add this one before calling Whisper:
`audio = Path(audio).rename(Path(audio).with_suffix('.wav'))`
Awesome content! Looking forward to your voice cloning video! Do you think it would be possible to just have an avatar you speak to instead of clicking on submit every time you say something?
I'm wondering about that too. There's probably some way to use a push to talk setup. That would be easier than setting up a noise-activated mic, I think. Then all you need is to route the text output through a text to speech API. For a visual, I don't know. I've never used those things. But there's probably tons of options for that
I have windows, so I can't use the text-to-speech shown in this video.
Let me know if you figure something out
I'm going to try using the text to speech api from google. You just need to set up a google cloud account, then a service account, and download your API key json file. Then you define a variable to the path of that file on your system.
So I'm a few steps closer
But I'm going to have to learn more about gradio if I want it to work. Ideally I want to include a field on the gradio page where I can also input text, for technical information. And the text-to-speech should have a toggle option. I think I will want to turn it off in some cases
Have you tried streamlit (instead of gradio)? Great video as always!
Holy. Shit.
New subscriber here Larry. Thanks for such a good educative video. I'm still fairly new in coding but would Streamlit be another good application to create an interface?
Streamlit is great, I made a video on it where I made a financial dashboard
@@parttimelarry OK, thanks for the help 🙏🏾
please finish the 2nd part of the previous video!!!!!!!!!!!!
It's in the works, I got excited about the shiny new thing and wanted to be one of the first to make a video on it lol
man this is so awesome, will def try it for myself, is there a way to store the conversations? also what program are you using for coding?
Push it to a database
@@bobsaydahmat5060 oh thanks!! how can i do that for this specific project?
Help with windows. Still unable to use voice on windows. Get error output. Even the part for "transcription goes here". Cannot return output when i make an input. Not entering an input returns the "transcript goes here". I believe some problem in this area is stopping me from getting further.
What platform are you running the code on ?
I keep trying it on Jupyter but the "gpt-3.5-turbo" model doesn't seem to work.
Instead it keeps asking to switch to "davinci"
The error message suggests that you I am trying to use a chat model with the v1/completions endpoint, which is not supported. And that I should use the v1/chat/completions endpoint instead.
I think enhancing the code could be achieved by incorporating built-in text-to-speech and speech-to-text functionality, eliminating the need to manually record your input.
How can I do on a server in python to create the last part text to speech? I am using pyttsx3 and the command runandwait doesn't stop the loop of the AI. Amazing video!
running into the same issue lol - and sadly ChatGPT hasn't been helpful in solving it either.
can u add speaker recognition functionality and a log, so a meeting style script is recorded with which speaker is speaking annotated?
I get this error. I have tried "pip install ffprobe" and "pip install ffmpeg". Still get the error, pls help! RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH.
I got the same error trying it on Windows. It just can't find the file but I don't have the patience to start debugging that. I saw that someone mentioned using pyaudio so maybe I will try that instead of struggling with gradio. Let me know if you figure it out.
Hi Larry, I'm thinking about is there have chance make your chatgpt and whisper API to as windows system voice engine?
Super helpful. Does python have any libraries for us to choose different kinds of custom voices?
There are some cool libraries for voice cloning where you just need to provide some samples... I may talk about that soon
Thanks for sharing. I'm new to your YT channel, and still learning python. I encountered some error installing FFMPEG. btw i used Windows10. Thank you.
RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH.
weird, when i read the release documentations, my head went to HER 2013. and were in 2023, pretty coincidence
That's cool. You mentioned you do web development? Obviously you have doing other stuff besides web development
honestly, if you coded this using your Maschine instead of a keyboard, i wouldn’t not even be surprised.
Next level as usual Larry!👊I got an error that I have not seen before involving the audio file: RuntimeError: Cannot load audio from file: `ffprobe` not found. Please install `ffmpeg` in your system to use non-WAV audio file formats and make sure `ffprobe` is in your PATH...say what?
same error here
@@hectorvillafuerte8539 hey i am getting the same ive changed the code all different ways. If its fixed please post
here i will do the same
Still have not been able to fix this error.
@@AP-hv5dh theres a response to one of my comments with a code that ay work, but not been tested. we are currently testing. will update soon
@@DanielSallery Thanks Daniel! I'll keep an eye out. Really appreciate the heads up!Been trying al sorts of workarounds 🧐🧐
Hi Larry, wondering if you could do a video on getting / processing different time intervals at once (say a 5 candles and daily charts). So you want your strategy to run on a 5 min chart, but you also need to get the previous daily close which isn't available in the 5 min chart dataset
can you help people who are windows users?
All i got up to now is ChatGPT to work inside my windows cmd. Getting there, but slowly. I'm 100% in this to get it working on windows and will share the code if I do so.
Superb I look forward to it...thank you
@@saadehsan894 If you get it to work I'd appreciate the help. I have code written above in the comments. Got ChatGPT working inside CMD. Slowly getting there
How do you make a similar app connected to a MySQL database, so that chatgpt can answer your questions about any info contained in that database?