Build a Containerized Transcription API using Whisper Model and FastAPI
Vložit
- čas přidán 28. 06. 2024
- In this exciting tutorial, I'll guide you step by step on how to create your very own Containerized Transcription API using the powerful Whisper AI model and FastAPI as the backend framework.
We'll start by setting up a development environment and configuring FastAPI to build a robust web application. Then, we'll seamlessly integrate the open-source Whisper AI model, a cutting-edge solution for Speech-to-Text (STT) transcription, to enable accurate and efficient audio-to-text conversion.
But that's not all! We'll take it a step further by containerizing our application using Docker, ensuring that it runs consistently and efficiently in any environment. This approach not only simplifies deployment but also allows for scalability and easy management.
By the end of this tutorial, you'll have a fully functional, containerized Transcription API that can effortlessly convert audio files into text. Whether you want to automate your transcription tasks, enhance accessibility for your content, or explore the world of AI-powered applications, this project has you covered.
Don't forget to hit that "Like" button if you find this tutorial helpful, leave your questions and thoughts in the comments section below, and be sure to subscribe for more exciting AI and development tutorials.
GitHub Repo: github.com/AIAnytime
Whisper GitHub: github.com/openai/whisper
#openai #ai #python - Věda a technologie
Absolute quality content. So informative and I love how every step is explained in great detail.
Glad you liked it!
Thank you!
It's working perfectly
Thanks for the demo and info, very informative and precise. I truly appreciate it. Easy to deploy. Have a great day.
Glad it was helpful!
Solely judging from the title this is exactly what i need. I hope it works as I expect :D gonna keep watching
Thanks 👍
Great video, thanks very much! I'm looking to deploy Whisper for an app I'm working on which will require multiple transcriptions of small audio chunks to take place concurrently. If I were to deploy your solution on EC2, what sort of specs would I need?
Great explanations, thank you so much for the tutorial!
You're very welcome!
Outstanding!
You are incredible. Can we get more of end to end projects involving Docker
Thanks... you can watch this as well. czcams.com/video/7CeAJ0EbzDA/video.html
Hey , you created video on the text to image API in past , so can we able to create API that can use checkpoint from civitai , like able to use multiple checkpoint , models and able to call that API ? Is it possible ?
Do you know if speaker diarization (breaking up the transcription be each speaker) can be built into this?
Best way to deploy this container? AWS EC2 kind of expensive... needs lot of RAM
thank you so much! One question, in the first version of whisper you couldn't do a translation from English to Spanish. You could only do a .transcribe of one language or another but not the translation. Do you know if whisper v3 can now do translations from English to Spanish? or any updated whisperX or any options? In truth, where I want to use it the most is, for example, translating your videos since the CZcams translator is very bad and it is difficult to follow you. If possible, could you make a video? ;)
How can I use async with the code line: result = model.transcribe(temp.name)
When I run in Postman in headers I put Content-Type: multipart/form-data and in the Body I put Key as "files" and for Value I upload the .wav file. For some reason I get files: undefined
Maybe on Mac I'm supposed to do something different?
I got the same error. Because I called Files to the parameter and is mandatory (from FastAPI documentation) to call "file" the parameter in the function.
file: UploadFile
Then, you can access to the file:
File = file.file
I notice you're pushing the audio file via http post method. Is there anyway to pull the file from a given location? i.e. from AWS S3 bucket, file system etc...
Hii 👋,
Can you do for whisperjax?😉
Someone know how to handle myltiples requests and running in differents GPU sockets?
Because I have four GPU in the server but the model and FastAPI only use one GPU (number 0)
what happen when i pass 8 gn file
requirements file in incomplete. Is not working with the whisper library that i am usign from pypi
You don't have to install Whisper from pypi from requirements.txt. Dockerfile will take care of it. As it is building directly from Git.
@@AIAnytime I figured it out finally.
There were some issues in the newer version of openai-whisper package.
fastapi==0.78.0
uvicorn[standard]==0.23.2
aiofiles==23.2.1
python-multipart==0.0.6
torch==2.0.1
openai-whisper==20230314
tiktoken==0.3.1
hero
How can I run with GPU.
Currently when I run a container, the code line "DEVICE = "cuda" if torch.cuda.is_available() else "cpu"" the DEVICE is "cpu" though my computer has gpu.
Thanks.