242
7 958 084

Marker: This Open-Source Tool will make your PDFs LLM Ready

14:11

Master Fine-Tuning Mistral AI Models with Official Mistral-FineTune Package

23:32

Advanced Function Calling with Mistral-7B - Multi function and Nested Tool Usage

13:59

ChatGPT Desktop App: First Impressions and What's Missing!!!

7:32

NEW MISTRAL: Uncensored and Powerful with Function Calling

12:09

INSANELY FAST Talking AI: Powered by Groq & Deepgram

12:11

Why Cartesia-AI's Voice Tech is a Game-Changer You Can't Ignore!

In this video, I'm excited to introduce Cartesia AI's revolutionary real-time text-to-speech system, Sonic, which offers 135ms model latency and lifelike generative voice capabilities. I'll demonstrate how this versatile API can be integrated into your projects, including a step-by-step guide on obtaining and using the API key. With a variety of voices to choose from, including options for emotion customization, this platform stands out for its quality and speed. I'll also cover setting up a voice-to-voice chat assistant and how you can configure the voices for your needs. Stay tuned for more on voice cloning and advanced setups in upcoming videos!
#tts #aivoice #voicechat
🦾 Discord: discord.com/invite/t4eYQRUcXB
☕ Buy me a Coffee: ko-fi.com/promptengineering
|🔴 Patreon: www.patreon.com/PromptEngineering
💼Consulting: calendly.com/engineerprompt/consulting-call
📧 Business Contact: engineerprompt@gmail.com
Become Member: tinyurl.com/y5h28s6h
💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for 50% off).
Signup for Advanced RAG:
tally.so/r/3y9bb0a
LINKS:
play.cartesia.ai/
Verbi Github: github.com/PromtEngineer/Verbi
TIMESTAMP
00:00 Introduction to Cartesia AI's Text-to-Speech System
00:51 Demonstrating Voice Generation Speed and Quality
01:20 Exploring Different Voice Profiles
03:03 Setting Up Your Account and API Key
03:40 Customizing Voice Parameters
05:22 Implementing the Text-to-Speech System
05:53 Running the Standalone Example
10:36 Voice-to-Voice Chat Assistant Project
13:03 Conclusion and Future Plans
All Interesting Videos:
Everything LangChain: czcams.com/play/PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr.html
Everything LLM: czcams.com/play/PLVEEucA9MYhNF5-zeb4Iw2Nl1OKTH-Txw.html
Everything Midjourney: czcams.com/play/PLVEEucA9MYhMdrdHZtFeEebl20LPkaSmw.html
AI Image Generation: czcams.com/play/PLVEEucA9MYhPVgYazU5hx6emMXtargd4z.html

zhlédnutí: 8 280

Video

Marker: This Open-Source Tool will make your PDFs LLM Ready

14:11

Marker: This Open-Source Tool will make your PDFs LLM Ready

zhlédnutí 17KPřed 4 hodinami

In this video, I discuss the challenges of working with PDFs for LLM applications and introduce you to an open-source tool called Marker. Marker simplifies the conversion of complex PDF files into structured Markdown, making data extraction much easier. I compare Marker with NuGet, showing its superior performance in preserving document structure accurately. Additionally, I give a detailed tuto...

Master Fine-Tuning Mistral AI Models with Official Mistral-FineTune Package

23:32

Master Fine-Tuning Mistral AI Models with Official Mistral-FineTune Package

zhlédnutí 5KPřed 14 hodinami

In this video, I walk you through the official Mistral AI fine-tuning guide using their new Mistral FineTune package. This lightweight code base enables memory-efficient and high-performance fine-tuning of Mistral models. I delve into the detailed data preparation process and explain how to format your datasets correctly in JSONL format to get the best results. We'll also set up an example trai...

Advanced Function Calling with Mistral-7B - Multi function and Nested Tool Usage

13:59

Advanced Function Calling with Mistral-7B - Multi function and Nested Tool Usage

zhlédnutí 4,8KPřed 19 hodinami

Testing Multi and Nested Function Calls with Mistral 7b In this video, I explore the advanced function calling capabilities of the Mistral 7b v3 model, including multi-function and nested function calls. Using a Google Colab notebook by Uncle Code, I demonstrate how to set up, install the Mistral inference package, and log into Hugging Face hub. Practical examples of the model handling multiple...

ChatGPT Desktop App: First Impressions and What's Missing!!!

7:32

ChatGPT Desktop App: First Impressions and What's Missing!!!

zhlédnutí 3,8KPřed 21 hodinou

Official ChatGPT Desktop App for Mac OS - Early Access Review and Features In this video, I explore and review the newly released official ChatGPT desktop app for Mac OS. After downloading it from the ChatGPT website, I walk you through the installation process, launching the app, logging in, and utilizing its various features such as text input, uploading files, and voice conversations. I also...

NEW MISTRAL: Uncensored and Powerful with Function Calling

12:09

NEW MISTRAL: Uncensored and Powerful with Function Calling

zhlédnutí 7KPřed dnem

In this video, I explore the new Mistral 7B-v0.3 model, now available on Hugging Face. I'll show you how to install the Mistral inference package, download the model, and run initial queries. We also test its performance and highlight its new features like uncensored responses and function calling. Stay tuned for future videos on fine-tuning this model! #mistral #functioncalling #llm 🦾 Discord:...

INSANELY FAST Talking AI: Powered by Groq & Deepgram

12:11

INSANELY FAST Talking AI: Powered by Groq & Deepgram

zhlédnutí 7KPřed dnem

Fastest Voice Chat Inference with Groq and DeepGram In this video, I show how to achieve the fastest voice chat inference using Groq and DeepGram APIs. I compare their speeds to OpenAI’s Whisper and demonstrate how to set up and code the process. Learn about handling rate limits, buffering issues, and how to get started with these services. Stay tuned for future videos on local model implementa...

Creating JARVIS - Your Voice Assistant with Memory

13:47

Creating JARVIS - Your Voice Assistant with Memory

zhlédnutí 6KPřed 14 dny

In this video, you will see a demo of a voice assistant that can remember past conversations. We use external APIs like OpenAI's Whisper for audio transcription, GPT-4 for generating responses, and OpenAI's voice engine for text-to-speech conversion. The main focus is on using modular code and OpenAI's tools to construct a conversational assistant with a memory feature. 🦾 Discord: discord.com/i...

2:35

Creating J.A.R.V.I.S.

zhlédnutí 3,5KPřed 14 dny

A sneak peek of voice-to-voice chat assistant. 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Patreon: www.patreon.com/PromptEngineering 💼Consulting: calendly.com/engineerprompt/consulting-call 📧 Business Contact: engineerprompt@gmail.com Become Member: tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: PromptEngineering for...

First Impressions of Gemini Flash 1.5 - The Fastest 1 Million Token Model

19:22

First Impressions of Gemini Flash 1.5 - The Fastest 1 Million Token Model

zhlédnutí 7KPřed 14 dny

Just checked out Google's new Gemini Flash at Google I/O. It's a super-fast AI model designed for handling big tasks - think processing videos, audios, or huge codebases, all while keeping costs low. I put it through its paces against giants like GPT 3.5 and GPT 4.0, looking at performance, costs, and how it handles real-world tasks. I even tried confusing it with tricky questions and coding ch...

9:10

Google IO: Agents is The Future - Demos

zhlédnutí 3,4KPřed 14 dny

Google IO was all about Agents. Here are some examples demo shown. 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Patreon: www.patreon.com/PromptEngineering 💼Consulting: calendly.com/engineerprompt/consulting-call 📧 Business Contact: engineerprompt@gmail.com Become Member: tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: bit.ly/localGPT (use Code: P...

Getting Started with GPT-4o API, Image Understanding, Function Calling and MORE

16:51

Getting Started with GPT-4o API, Image Understanding, Function Calling and MORE

zhlédnutí 8KPřed 14 dny

Getting Started with GPT 4.0: A Comprehensive Tutorial This video tutorial guides you through the basics of getting started with the GPT-4o API, including comparisons with GPT 4.0 Turbo, exploring capabilities like text generation, image understanding, and function calling. 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Patreon: www.patreon.com/Prompt...

GPT-4o: OpenAI's NEW OMNI-MODEL Can DO it ALL

18:04

GPT-4o: OpenAI's NEW OMNI-MODEL Can DO it ALL

zhlédnutí 4,3KPřed 14 dny

In this video we look at GPT-4 OmniModel, a groundbreaking AI model capable of processing and responding to audio, vision, and text in real-time. Demonstrating its versatility, the video showcases various scenarios including customer support, language translation, and educational tutoring, highlighting the OmniModel's ability to understand and interact in near-human response times. 🦾 Discord: d...

Yi-1.5: True Apache 2.0 Competitor to LLAMA-3

16:02

Yi-1.5: True Apache 2.0 Competitor to LLAMA-3

zhlédnutí 6KPřed 14 dny

In this video, we will look at Yi-1.5 series models were just released by 01-AI. This update includes 3 different models with sizes ranging from 6 billion to 34 billion parameters and training on up to 4.1 trillion tokens. All models are released under Apache 2.0 license. 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Patreon: www.patreon.com/PromptEn...

NVIDIA ChatRTX: Private Chatbot for Your Files, Image Search via Voice | How to get started

10:12

NVIDIA ChatRTX: Private Chatbot for Your Files, Image Search via Voice | How to get started

zhlédnutí 8KPřed 21 dnem

This video provides an in-depth review and tutorial of NVIDIA's ChatRTX, a new tool designed for users with RTX GPUs on Windows PCs. The tool leverages Retrieval Augmented Generation technology and tensor RT LLM alongside RTX acceleration to chat with documents and use voice interaction. It now supports local photo and image search with improvements in its features. The application requires spe...

Free LOCAL Copilot to Take Your Coding to the NEXT LEVEL

9:39

Free LOCAL Copilot to Take Your Coding to the NEXT LEVEL

zhlédnutí 6KPřed 21 dnem

Free LOCAL Copilot to Take Your Coding to the NEXT LEVEL

Free Copilot to Take Your Coding to the NEXT LEVEL

12:39

Free Copilot to Take Your Coding to the NEXT LEVEL

zhlédnutí 13KPřed 21 dnem

Free Copilot to Take Your Coding to the NEXT LEVEL

Llama-3 🦙 with LocalGPT: Chat with YOUR Documents in Private

12:24

Llama-3 🦙 with LocalGPT: Chat with YOUR Documents in Private

zhlédnutí 9KPřed 28 dny

Llama-3 🦙 with LocalGPT: Chat with YOUR Documents in Private

Extending Llama-3 to 1M+ Tokens - Does it Impact the Performance?

16:31

Extending Llama-3 to 1M+ Tokens - Does it Impact the Performance?

zhlédnutí 11KPřed měsícem

Extending Llama-3 to 1M Tokens - Does it Impact the Performance?

Get your own custom Phi-3-mini for your use cases

17:46

Get your own custom Phi-3-mini for your use cases

zhlédnutí 11KPřed měsícem

Get your own custom Phi-3-mini for your use cases

How Good is LLAMA-3 for RAG, Routing, and Function Calling

17:57

How Good is LLAMA-3 for RAG, Routing, and Function Calling

zhlédnutí 8KPřed měsícem

How Good is LLAMA-3 for RAG, Routing, and Function Calling

How Good is Phi-3-Mini for RAG, Routing, Agents

21:07

How Good is Phi-3-Mini for RAG, Routing, Agents

zhlédnutí 10KPřed měsícem

How Good is Phi-3-Mini for RAG, Routing, Agents

Does Size Matter? Phi-3-Mini Punching Above its Size on "BENCHMARKS"

13:30

Does Size Matter? Phi-3-Mini Punching Above its Size on "BENCHMARKS"

zhlédnutí 5KPřed měsícem

Does Size Matter? Phi-3-Mini Punching Above its Size on "BENCHMARKS"

10:02

Llama-3 Is Not Really THAT Censored

zhlédnutí 7KPřed měsícem

Llama-3 Is Not Really THAT Censored

MIXTRAL 8x22B: The BEST MoE Just got Better | RAG and Function Calling

10:53

MIXTRAL 8x22B: The BEST MoE Just got Better | RAG and Function Calling

zhlédnutí 4,1KPřed měsícem

MIXTRAL 8x22B: The BEST MoE Just got Better | RAG and Function Calling

Insanely Fast LLAMA-3 on Groq Playground and API for FREE

8:54

Insanely Fast LLAMA-3 on Groq Playground and API for FREE

zhlédnutí 24KPřed měsícem

Insanely Fast LLAMA-3 on Groq Playground and API for FREE

LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

15:17

LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

zhlédnutí 45KPřed měsícem

LLAMA-3 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

11:22

LLAMA 3 Released - All You Need to Know

zhlédnutí 11KPřed měsícem

LLAMA 3 Released - All You Need to Know

WizardLM 2 - First Open Model Outperforming GPT-4

13:55

WizardLM 2 - First Open Model Outperforming GPT-4

zhlédnutí 16KPřed měsícem

WizardLM 2 - First Open Model Outperforming GPT-4

Create Financial Agents with Vision 👀 - Powered by Claude 3 Haiku & Opus

13:18

Create Financial Agents with Vision 👀 - Powered by Claude 3 Haiku & Opus

zhlédnutí 6KPřed měsícem

Create Financial Agents with Vision 👀 - Powered by Claude 3 Haiku & Opus

Komentáře

@aifortune Před hodinou
I'm all in. better price the eleven labs.
@Cedric_0 Před 4 hodinami
Was working on a project whwre i need to use my local language but having issuse with coqui ai tts Library, aby other alternative that would be helpful, and easy to use thank you
@engineerprompt Před 4 hodinami
Try meloTTS
@Cedric_0 Před hodinou
Thank you, I will try it
@Beetgrape Před 6 hodinami
is it faster than Deepgram?
@engineerprompt Před 6 hodinami
Yes, on the playground. The Cartesia team recommends streaming. I am going to test that and report.
@hnb13686 Před 7 hodinami
THis is not completely open-source so dont report it as such with clarification midway in the vid.
@GAllium14 Před 7 hodinami
What software do you use for those super smooth zooms?
@engineerprompt Před 6 hodinami
It's called screen studio. It's only for mac
@gorripotinikhileswar7087 Před 8 hodinami
Hey , Can we use this offline?
@engineerprompt Před 7 hodinami
Yes
@cristian_palau Před 8 hodinami
thank you for sharing this excelent tools!
@MrLogansrun35 Před 10 hodinami
why do they censor these models ? AI should remain non biased and present facts when asked not give you reasons why it cannot answer a question just because the truth may offend . facts don't care about feelings. Glad they have overcome censorship.
@Larsbor Před 10 hodinami
Ok as usual the lack of Gui destroys it for me..😢
@trusterzero6399 Před 4 hodinami
Grow out of that and a world will open up
@Larsbor Před 10 hodinami
I am uncertain about marker, it is for scientific use, but says it removes footers, that is where you normally put in your sources, and apendix links.. so?!
@themorethemerrier281 Před 11 hodinami
This sounds very interesting but I will need to learn some python environment basic before I can put this to the test. A solution like this could help me a lot!
@johntdavies Před 11 hodinami
Thanks for posting Verbi. I wanted to get it to speak more than just English, I couldn't find any Carteia models that were anything other than English or American but ElevenLabs has great multi-lingual support. The following change in text_to_speach() enable ElevenLabs to speak quite a few languages... elif model == 'elevenlabs': ELEVENLABS_VOICE_ID = "Rachel" client = ElevenLabs(api_key=api_key) audio = client.generate( text=text, voice=ELEVENLABS_VOICE_ID, output_format="mp3_22050_32", model="eleven_multilingual_v2" ) elevenlabs.save(audio, output_file_path)
@chjpiu Před 12 hodinami
Hi, do you know how much RAM is required for this application? I tried, but it said that it was out of memory. My laptop has 16 GB RAM w/o Nvidia GPU. Thanks a lot
@drgutman Před 13 hodinami
meh, I thought it's a better local tts ...ohh well.
@user-yi2mo9km2s Před 15 hodinami
Nobody would pay for services while we can do it on our own PC locally.
@user-yi2mo9km2s Před 15 hodinami
No thanks for advertising.
@michalbiros6221 Před 17 hodinami
Oh boy, it's three times more expensive than Google's premium voices and only includes English. Skipped.
@pawan3133 Před 19 hodinami
Thanks for another great video!! Can you please make a video or at least share the material on fine-tuning a quantized mistral v0.3 model
@engineerprompt Před 17 hodinami
In general, you want to load the model in 4-bit. Look at my finetuning videos using unsloth.
@KCM25NJL Před 19 hodinami
They still have natural cadence issues, which is a hard problem to solve.
@engineerprompt Před 17 hodinami
Yes, I think this is just the alpha version so hopefully will get better over time.
@mohsenghafari7652 Před 20 hodinami
thanks
@ScottzPlaylists Před 20 hodinami
I'm interested in open source only... can't finish watching. Thumbs down, sorry.
@Beetgrape Před 21 hodinou
dude, I wanna deploy this on huggingface as an API. make a tutorial on this.
@engineerprompt Před 17 hodinami
deployment series is coming soon, will give you an idea on how to do this.
@gregsLyrics Před 22 hodinami
Brilliant vid - it is a godsend. OCRing a PDF is just not workable, period. I gave up on attempting parsing PDF. This new information is amazing and I am once again excited.
@engineerprompt Před 17 hodinami
Glad it was helpful!
@MrKarlyboy Před 22 hodinami
If you wanted this to plug into a chatbot the pricing does not add up. I've done some crunching, it won't even get you far with a basic smallish customer doing say doing 1000-3000 chats a month which isn't a lot. Most engines price in at audio sequence every 15s or 1m. More good engines are emerging. For our low end customers, we usually see 3 to 5 concurrency anyway and that's like the smallest model. Currently we have done 100's of millions of chat, 100's of millions of live chat too. So getting into the billions. The market is competitive. Some of the new google studio voices are comparable, deep gram too. Sure these are nice voices but for streaming api, at cost and competitive, sorry but no! unless the pricing model radically improves. It's early days so hopefully there will be new models, new options and a realization. Suggest you take say 5000, 10000, 30000 and 100,000 chats and work out the text size average transcript on the bot side, and average out the characters. You will see my point!
@engineerprompt Před 17 hodinami
that's a valid argument. Hopefully they will be able to reduce their price as they scale.
@christopherchilton-smith6482 Před 23 hodinami
I wonder how far away we are from arbitrarily high accuracy on tasks like this.
@engineerprompt Před 16 hodinami
To be honest, when it comes to voice models, open source models are lagging behind!
@greymooses Před dnem
If you do make a video about scraping data, please go over content that requires javascript to load. It’s been difficult to find a clear guide specifically for capturing this data for LLM usage. I loved this video, thank you!
@engineerprompt Před 17 hodinami
I haven't look into it before so let me see what I can come up with.
@tx3851 Před dnem
They do not sound good at all....
@arjundesai2715 Před dnem
thanks for the feature! super excited to keep building here. For the best experience w/ the API, I recommend using `stream=True` to get the first audio back super fast . Audio will come back in chunks. we'll add more info about how to use this to our docs
@engineerprompt Před 17 hodinami
thanks for pointing it out. I do feel the docs need more work, I am going to explore it further. thanks for putting it together.
@keithprice3369 Před dnem
Has anyone done a demo of a single Cartesia voice outputting something like podcast length? 20 to 30 minutes? The human quality on short text is stunning but I worry that over longer text it will fall into repetitive cadence. The fact that voices are cloned on just a 20 second sample reinforces my concern. Have you tested that?
@engineerprompt Před 17 hodinami
Interesting point, I will do a test and report back. It will be a fun experiment.
@danielpicassomunoz2752 Před dnem
Anything to convert to epub? Getting rid of headers and footers
@Jayden-qq1ei Před dnem
Markdowns for PDF for LLM😁
@engineerprompt Před 17 hodinami
:)
@canaldetestes4517 Před dnem
Thanks but I'm Brazilian and didn't find portuguese in it
@engineerprompt Před 17 hodinami
At the moment, its only English.
@canaldetestes4517 Před 11 hodinami
@@engineerprompt Hi, ok. Thank you for your attention and answer
@DevsDoCode Před dnem
Hey Prompt Engineer, If you don't mind. Could i also be a contributor of your Project. I have some wonderful Features which could help you make your Verbi AI more better and a perfect voice assistant 🥹 Its a request to add me in the group. I would disappoint you 😼
@engineerprompt Před 17 hodinami
Yes, would love contributions. Please open a PR. We have a dedicated channel on the discord server. Feel free to join the discussion there.
@unclecode Před dnem
Remind me of Elevenslab's early days. I think they use stream mode in their playground, measuring the time it takes to generate the first audio segment. That's why seems very fast. What do u think?
@engineerprompt Před 17 hodinami
That's exactly how they are doing it. Their cofounder pointed it out and suggested to enable streaming via api as well. On the discord a contributor to project-verbi said its possible to get about 200-400ms with streaming. I might redo this again.
@P-G-77 Před dnem
This... incredible... awesome, NICE WORK !!
@avi7278 Před dnem
Appreciate your efforts, but why the heck would you need an API call to get the ID of the voice you want to use or other seemingly static parameters? Also the API latency is terrible compared to their playground. Either you're doing something unnecessary still or their infrastructure is poor, which defeats the purpose of their supposedly low latency. Further the text to speech piece should be chunked into sentences and be streamed to the TTS service instead of waiting for the full response. This is OK for one or two sentence responses but if latency increases linearly then it's no good. Is there endpointing? interruption?
@arjundesai2715 Před dnem
thanks for the feedback @avi7278. 1. you can get the voice_id straight from the playground! will have support very soon for passing that in directly 2. For the best experience w/ the API, recommend using `stream=True` to get the first audio back super fast 🚀. Audio will come back in chunks. we'll add more info about this to our docs 3. you can definitely send text chunks over the wire, will have more native support for text streaming soon
@gsagum Před dnem
the free plan is "10000" characters , while the lowest $5 per month gets you "100,000 characters per month". I re-read that again, its in "characters" and not "words" . am i dreaming? so one letter is one character, right? is that correct? isn't that super expensive.
@3750gustavo Před dnem
Its cheap compared to the other paid voice service (elevenlabs), that gives only 30k characters for 5 dollars, for the same 100k characters, it cost over 20 dollars on elevenlabs, 4x more expensive, but yeah, compared to other AI services where you pay once and gets almost unlimited usage, like infermatic for text AI, its expensive.
@simongus Před dnem
And with character they can count every space as a character.
@BackTiVi Před dnem
Yup that's only characters. On average, 1000 characters is about 1 min of audio iirc, so the free tier is 10 min audio. For the same price ($5), the starter pack of Elevenlabs is only 30000 characters per month, so only half an hour.
@ronilevarez901 Před 14 hodinami
@@BackTiVi I'll stay with Coqui.
@anandgs Před dnem
I had another question, are you also on Udemy?
@engineerprompt Před 17 hodinami
I am not on Udemy but just launching my RAG course here: prompt-s-site.thinkific.com/courses/rag
@anandgs Před dnem
Thank you very much!!! I was looking for something like this for a long time. I work for a large bank but with very small budget for my project. Due to budget crunch we cannot afford buying third party tools, this sounds to be a perfect fit but since there is a limit of $5MN we may not qualify to use this for free. Would you suggest going with Nougat or you have a better alternative for my use case, really appreciate your content!
@engineerprompt Před 17 hodinami
Nougat can be an option or look into unstructuredio. Also I would recommend to look into Claude or GPT4o with vision if data privacy is not a big issue. Some of these proprietary tools have good data privacy based on their TOS.
@anandgs Před 7 hodinami
@@engineerprompt Thanks for the prompt response!!
@MeinDeutschkurs Před dnem
I‘m always so impressed by models like this. But where are all the open source solutions according to this topic? Research is crazy!
@tribuzeus Před dnem
Multi-language?
@py_man Před dnem
No
@KaranGoel_krandiash Před 22 hodinami
Coming soon 🚁
@Zale370 Před dnem
The more models I use the less I want to pay for the apis
@14supersonic Před dnem
Yeah, we really need this stuff free and open source. The only real limiting factor is the affordability of the GPU(s) needed to run this locally. There's stuff out there, but local open source audio stuff is behind text and image based models sadly, but maybe soon that'll change.
@Zale370 Před dnem
@@14supersonic actually you can use stable diffusion locally with a mid range 12 gb commercial gpu for image generation, audio models as well, also quantized llm models are very good for simpler tasks like summarizations
@14supersonic Před dnem
@Zale370 I know, that's why I said audio based AI models are behind text and image based solutions. When you compare something like local Llama 3 or SD3 to local audio based AI models, there's no audio modality comparable to them yet in terms of local usage.
@Alex29196 Před dnem
Indeed, there are no optimal and swift text-to-speech (TTS) solutions for local LLM inference. I personally believe this is not solely due to GPU memory constraints but also driven by security considerations.
@ts757arse Před dnem
Yeah, I'm a security consultant and the risks inherent in this are just insane. I won't ever say open source should slow down but I appreciate the time we are getting to communicate what's coming. Amazingly, the EU AI legislation classifies voice cloner AI as lower risk. I don't think they've ever got a phone call from their doctor asking them to stop a particular medication or their wife saying they're being held hostage and they're demanding all the money. It gets darker from there.
@GameofLifeChannel Před dnem
Liking without watching coz I know this is gonna be amazing
@manjula_1 Před dnem
This is Very useful!, Now, In next video, Tell how to finetune any model (with some long context length like "Phi-3-mini-128k-instruct") With this Markdown Data 😍😍
@engineerprompt Před 16 hodinami
let me see what i can do :)
@engineerprompt Před dnem
If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag
@engineerprompt Před dnem
If you are interested in learning more about how to build robust RAG applications, check out this course: prompt-s-site.thinkific.com/courses/rag
@drmartinbartos Před dnem
Around 7minutes, having installed a conda environment you select pip not conda when installing PyTorch - any reason why? If there’s a working conda option doesn’t it make sense to keep using conda and only use pip when you absolutely have to? Just wondering.. (thanks for the video btw - had just been wondering about effective ways of making off content reliably available to RAG and the video is super-useful).
@engineerprompt Před dnem
I usually use pip because that has most of the python packages available. conda is somehow limited with available python packaged. conda will also work in this case but its more of my own habit at this point :)
@Sneakylamah Před dnem
On my m1 Mac i have tried this out, installing dependencies = [ "torch>=2.3.0", "torchvision>=0.18.0", "torchaudio>=2.3.0", "marker-pdf>=0.2.13", ] Then when i try out just a single pdf it fails on a simple python import. marker_single 26572517.pdf OUTPUT --max_pages 2 --langs English Traceback (most recent call last): File "marker/.venv/bin/marker_single", line 5, in <module> from convert_single import main File "marker/.venv/lib/python3.12/site-packages/convert_single.py", line 5, in <module> from marker.convert import convert_single_pdf ModuleNotFoundError: No module named ‘marker.convert' Anyone getting the same? Tried with python 3.10 and 3.12
@engineerprompt Před dnem
are you using a virtual environment? use this command: python -m pip install marker-pdf This will ensure its installing the package in the current virtual env.
@Sneakylamah Před dnem
Using rye, and yes it is there in my virtual env.
@Sneakylamah Před dnem
The marker scripts are there to be called.
@Sneakylamah Před dnem
@@engineerprompt Ok the problem seems to be with the way Rye handled the imports, sorry bout that. Creating the virtual env normally i can run the commands. Thanks for the video, i have been looking for how to do this a long time.
@DavidJNowak Před dnem
What I want is for LLMs to cook my next meal.
@drmetroyt Před dnem
Docker version please

Prompt Engineering

Komentáře