Super Easy Way To Parse PDF | LlamaParse From LlamaIndex | LlamaCloud

Sdílet
Vložit

Komentáře • 24

  • @SantK1208
    @SantK1208 Před 6 měsíci +3

    You have latest and quality content, I always wait for ur video. Thanks ❤❤

  • @THE-AI_INSIDER
    @THE-AI_INSIDER Před 6 měsíci +7

    Pls Do a video for unstructured io. Much needed. Please do it with local ollama
    Also if we use this method , then will llama cloud have access to our private documents? As we are parsing those documents with llama parse

    • @datasciencebasics
      @datasciencebasics  Před 6 měsíci

      Sure, will take that into account. Well, you are sending data to API so yes, it is stored somewhere in the cloud. If you have sensitive informations better to talk with them how to deal with it.

  • @saeednsp1486
    @saeednsp1486 Před 6 měsíci +2

    i would really like to see a great local RAG
    right now i think privategpt+ollama {mixtral} +reranker + unstructured io + OCR + qdrant is a good combination , as you said garbage in, garbage out !! so preprocessing the pdf files, specially pdfs that are complex and have tables,pictures,diagrams and all sorts of stuff , is the key to get the correct answer in RAG system
    can you please build the most accurate Local RAG platform for complex pdf files as of 2024 ? i think we all need a video like this, please make this video

    • @datasciencebasics
      @datasciencebasics  Před 6 měsíci

      that’s a good combination of tools out there. But again, running locally ( as models need to be quantized ), I still see some hiccups. But, lets see it might change.

    • @vap0rtranz
      @vap0rtranz Před 12 dny

      @saeednsp1486 look at AutoRAG. It's the only LlamaIndex based "all-in-one" app that I've seen that can do all inference locally (via Ollama backend). pip install and it's up and running. Its author actually meant to help evaluate RAG pipelines, so it's a bit programmer heavy still. Its takes some Python to get working. But a by-product of that goal is that AutoRag has everything together based on local LlamaIndex. The authors stuff is around on YT, Medium, and Reddit. I don't think it has LlamaParser, so maybe a feature request/PR?

  • @NiKogane
    @NiKogane Před 6 měsíci

    Your Chanel is a gold mine and your videos are gems!
    Thanks you for the great work!
    BTW, what do you use to highlight webpages like in the LlamaParse page?
    Keep up the great work!

    • @datasciencebasics
      @datasciencebasics  Před 6 měsíci

      You are welcome, Glad the videos are helpful. The highlighter I am using is Weava Highlighter.

  • @TooyAshy-100
    @TooyAshy-100 Před 6 měsíci

    Dear Sir,
    Could you please, in integrating multiple cutting-edge technologies into a single system. Specifically, I am interested in combining the following components:
    - RAG (Retrieval-Augmented Generation)
    - Nomic Embedding Model
    - OllaMa language model
    - Groq hardware accelerator
    -Chainlit
    Additionally, please specify which language models should be used as the base for the system. Two potential options could be:
    #model_name='llama2-70b-4096'
    #model_name='mixtral-8x7b-32768'
    and etc...
    Thank you for your time and consideration. I greatly enjoyed your recent video and anticipate future content.👍

    • @datasciencebasics
      @datasciencebasics  Před 6 měsíci

      Hello, you should have checked the video before this 😎
      Crazy FAST RAG | Ollama | Nomic Embedding Model | Groq API
      czcams.com/video/TMaQt8rN5bE/video.html

  • @nicolassuarez2933
    @nicolassuarez2933 Před 4 měsíci

    Outstanding! But how to get the metadata? Thanks!

  • @vatsalsharma6384
    @vatsalsharma6384 Před 3 měsíci

    I had a naive doubt, beneath the query engine there's an associated llm that is working right? Otherwise how are we getting responses without using a llm?
    If yes, then where is the model specified, as to which llm we are using.
    If no, how such a well framed answer is coming without using llm, because as far as i know, it is the llm which actually takes the relevant pieces of context and stitches them to provide an answer in natural language.

    • @datasciencebasics
      @datasciencebasics  Před 3 měsíci +1

      Yes, Llamaparseuses offcourse something behind the scene which is not revealed as its their service 🙂 It kow supports GPT-4o model for the same which is more expensive but better.

    • @vatsalsharma6384
      @vatsalsharma6384 Před 3 měsíci

      @@datasciencebasics Yes, I tooo studied the documentation and found out that until a specific model is not given llamaindex uses the OpenAI GPT 3.5 turbo model.
      Just another quick question, any downsides of llamaparse? Because for me it works well on parsing as well as extracting data from text as well as tables in a pretty satisfactory manner.
      Why are people then using pypdf, or apache pdf extraction tools or even paddleocr kind of ocr engines for text extraction and not simply this library.
      Additionally llamaparse can be integrated with the chains of langchain as well, which means it has no restriction that it can be used by llamaindex only, then why other frameworks??
      please clarify this doubt I am new to this field.

  • @THE-AI_INSIDER
    @THE-AI_INSIDER Před 6 měsíci

    can i replace llama2 embedding with nomic embed text and the ollama model with mistral? will it work ? actually i tried and it didnt, am i missing something?

  • @jaivalani4609
    @jaivalani4609 Před 3 měsíci

    Very Nice Video ,thanks, One thing I have found additional steps post Markdown , could you help understanding what does it do post the markdown "node_parser = MarkdownElementNodeParser(
    llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8
    nodes = node_parser.get_nodes_from_documents(documents)"

  • @ashokpandey57
    @ashokpandey57 Před 5 měsíci

    great job!!

  • @VenkatesanVenkat-fd4hg
    @VenkatesanVenkat-fd4hg Před 6 měsíci

    Thanks and waiting for ur valuable videos. I like to process .docx files and get text & page number details. I think, no proper library available to get page number details of .docx.....

    • @datasciencebasics
      @datasciencebasics  Před 6 měsíci +1

      You are welcome. I hope soon Llamaparse will handle that. If I find something useful, will make a video on that.

  • @SantK1208
    @SantK1208 Před 6 měsíci

    Using llama parsing means our data is exposed to external API , right ?

    • @datasciencebasics
      @datasciencebasics  Před 6 měsíci

      yes it is, before using any sensitive information, I suggest you to contact them how it is handled.

  • @Kani_Srini
    @Kani_Srini Před 3 měsíci

    Hi.Thanks for the video & great explanation. I tried your code. I am getting Error : "Retrying llama_index.embeddings.openai.base.get_embeddings in 0.97 seconds as it raised APIConnectionError: Connection error.." . Do you know how to resolve this?. This error happens in line "index = VectorStoreIndex.from_documents(documents)"