Super Easy Way To Parse PDF | LlamaParse From LlamaIndex | LlamaCloud

Data Science Basics

zhlédnutí 9 692

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 13. 09. 2024
Věda a technologie

Komentáře • 24

@SantK1208 Před 6 měsíci ⁺³
You have latest and quality content, I always wait for ur video. Thanks ❤❤
@datasciencebasics Před 6 měsíci
You are welcome. Glad that the videos are helpful.
@THE-AI_INSIDER Před 6 měsíci ⁺⁷
Pls Do a video for unstructured io. Much needed. Please do it with local ollama
Also if we use this method , then will llama cloud have access to our private documents? As we are parsing those documents with llama parse
@datasciencebasics Před 6 měsíci
Sure, will take that into account. Well, you are sending data to API so yes, it is stored somewhere in the cloud. If you have sensitive informations better to talk with them how to deal with it.
@saeednsp1486 Před 6 měsíci ⁺²
i would really like to see a great local RAG
right now i think privategpt+ollama {mixtral} +reranker + unstructured io + OCR + qdrant is a good combination , as you said garbage in, garbage out !! so preprocessing the pdf files, specially pdfs that are complex and have tables,pictures,diagrams and all sorts of stuff , is the key to get the correct answer in RAG system
can you please build the most accurate Local RAG platform for complex pdf files as of 2024 ? i think we all need a video like this, please make this video
@datasciencebasics Před 6 měsíci
that’s a good combination of tools out there. But again, running locally ( as models need to be quantized ), I still see some hiccups. But, lets see it might change.
@vap0rtranz Před 12 dny
@saeednsp1486 look at AutoRAG. It's the only LlamaIndex based "all-in-one" app that I've seen that can do all inference locally (via Ollama backend). pip install and it's up and running. Its author actually meant to help evaluate RAG pipelines, so it's a bit programmer heavy still. Its takes some Python to get working. But a by-product of that goal is that AutoRag has everything together based on local LlamaIndex. The authors stuff is around on YT, Medium, and Reddit. I don't think it has LlamaParser, so maybe a feature request/PR?
@NiKogane Před 6 měsíci
Your Chanel is a gold mine and your videos are gems!
Thanks you for the great work!
BTW, what do you use to highlight webpages like in the LlamaParse page?
Keep up the great work!
@datasciencebasics Před 6 měsíci
You are welcome, Glad the videos are helpful. The highlighter I am using is Weava Highlighter.
@TooyAshy-100 Před 6 měsíci
Dear Sir,
Could you please, in integrating multiple cutting-edge technologies into a single system. Specifically, I am interested in combining the following components:
- RAG (Retrieval-Augmented Generation)
- Nomic Embedding Model
- OllaMa language model
- Groq hardware accelerator
-Chainlit
Additionally, please specify which language models should be used as the base for the system. Two potential options could be:
#model_name='llama2-70b-4096'
#model_name='mixtral-8x7b-32768'
and etc...
Thank you for your time and consideration. I greatly enjoyed your recent video and anticipate future content.👍
@datasciencebasics Před 6 měsíci
Hello, you should have checked the video before this 😎
Crazy FAST RAG | Ollama | Nomic Embedding Model | Groq API
czcams.com/video/TMaQt8rN5bE/video.html
@nicolassuarez2933 Před 4 měsíci
Outstanding! But how to get the metadata? Thanks!
@vatsalsharma6384 Před 3 měsíci
I had a naive doubt, beneath the query engine there's an associated llm that is working right? Otherwise how are we getting responses without using a llm?
If yes, then where is the model specified, as to which llm we are using.
If no, how such a well framed answer is coming without using llm, because as far as i know, it is the llm which actually takes the relevant pieces of context and stitches them to provide an answer in natural language.
@datasciencebasics Před 3 měsíci ⁺¹
Yes, Llamaparseuses offcourse something behind the scene which is not revealed as its their service 🙂 It kow supports GPT-4o model for the same which is more expensive but better.
@vatsalsharma6384 Před 3 měsíci
@@datasciencebasics Yes, I tooo studied the documentation and found out that until a specific model is not given llamaindex uses the OpenAI GPT 3.5 turbo model.
Just another quick question, any downsides of llamaparse? Because for me it works well on parsing as well as extracting data from text as well as tables in a pretty satisfactory manner.
Why are people then using pypdf, or apache pdf extraction tools or even paddleocr kind of ocr engines for text extraction and not simply this library.
Additionally llamaparse can be integrated with the chains of langchain as well, which means it has no restriction that it can be used by llamaindex only, then why other frameworks??
please clarify this doubt I am new to this field.
@THE-AI_INSIDER Před 6 měsíci
can i replace llama2 embedding with nomic embed text and the ollama model with mistral? will it work ? actually i tried and it didnt, am i missing something?
@jaivalani4609 Před 3 měsíci
Very Nice Video ,thanks, One thing I have found additional steps post Markdown , could you help understanding what does it do post the markdown "node_parser = MarkdownElementNodeParser(
llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8
nodes = node_parser.get_nodes_from_documents(documents)"
@ashokpandey57 Před 5 měsíci
great job!!
@datasciencebasics Před 5 měsíci
Thanks !!
@VenkatesanVenkat-fd4hg Před 6 měsíci
Thanks and waiting for ur valuable videos. I like to process .docx files and get text & page number details. I think, no proper library available to get page number details of .docx.....
@datasciencebasics Před 6 měsíci ⁺¹
You are welcome. I hope soon Llamaparse will handle that. If I find something useful, will make a video on that.
@SantK1208 Před 6 měsíci
Using llama parsing means our data is exposed to external API , right ?
@datasciencebasics Před 6 měsíci
yes it is, before using any sensitive information, I suggest you to contact them how it is handled.
@Kani_Srini Před 3 měsíci
Hi.Thanks for the video & great explanation. I tried your code. I am getting Error : "Retrying llama_index.embeddings.openai.base.get_embeddings in 0.97 seconds as it raised APIConnectionError: Connection error.." . Do you know how to resolve this?. This error happens in line "index = VectorStoreIndex.from_documents(documents)"

Další v pořadí

Automatické přehrávání

RAG With LlamaParse from LlamaIndex & LangChain 🚀