NEW AI Framework - Steerable Chatbots with Semantic Router

Sdílet
Vložit
  • čas přidán 12. 09. 2024

Komentáře • 164

  • @Darthus
    @Darthus Před 8 měsíci +18

    This may have been within your suggestions, but I can also see this for routing queries to different RAG models, for example, if someone is asking about rules of a game, versus asking to play the game, you could use separate models that are more specifically tuned to information retrieval vs creativity.

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +10

      should be doable with current version of library, putting together some examples over coming weeks and will include something like this, thanks!

    • @kavian4249
      @kavian4249 Před 2 měsíci

      @@jamesbriggs Any updates for this?

  • @broomva
    @broomva Před 8 měsíci +4

    This is great for creating 'fuzzy' like if/else statements, where the statement refers to a cloud of options, really cool. It's like filtering based on the embedding space

  • @benjaminrigby877
    @benjaminrigby877 Před 8 měsíci +10

    this is, of course, intent detection under a different name. the complexity it brings is that now you've got to monitor and update two different systems - so somebody starts asking for a "route" in a slightly different way - or with a different accent that gets transcribed in a different way, and you've got to be able to catch it and fix it across the generative and intent systems.

    • @thebozbloxbla2020
      @thebozbloxbla2020 Před 8 měsíci +2

      i guess one way to solve that would be to to have all routes sort of defined in a tuple dict. and then have another LLM check to see if the correct semantic route was chosen in async fashion. after that if no useful results were from, say a vector search in a specific category which was defined as a route and chosen by the route selector, then we can fall upon choosing the route suggested by LLM (which could be done in async)
      in the end we can compare results and send which ever is better as system response.
      obviously haven't looked too deep into time optimisation. but i have a feeling it could still be faster

    • @larsbell1569
      @larsbell1569 Před 6 měsíci +1

      Ya but won’t this be an order of magnitude cheaper and faster than an intent system?

  • @itsjustmeemman
    @itsjustmeemman Před 8 měsíci +3

    I have a production LLM endpoint and I realized that 70% of the time it takes to process everything happens on the different guard rails, intent identification and other classifiers in the sequence for a simple RAG 🥲. Thank you for this video I'll implement this and hopefully give some feedback or share what I've learned.

  • @truehighs7845
    @truehighs7845 Před 8 měsíci +1

    This is pretty much exactly what Langchain was missing, especially combined with actions and API calls! Well done!

  • @tonyrungeetech
    @tonyrungeetech Před 8 měsíci +8

    Really looking forward to seeing it applied to RAG! I've been thinking about some sort of process where we search the top K results, then doing a simple semantic evaluation of 'yes I found it' or 'not enough info' to trigger a more in depth search - is it something along those lines?

    • @megamehdi89
      @megamehdi89 Před 8 měsíci +1

      Great idea

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +2

      yeah it feels like it would be a similar process when using the default RouteLayer + static Routes

  • @dusanbosnjakovic6588
    @dusanbosnjakovic6588 Před 8 měsíci +2

    I love your work and this video. However I have implemented something like this in heavy production and faced a lot of issues. Chat history, complex multi intent queries and just overall accuracy prevented us from moving forward. Surprised you think this is such a slam dunk.

  • @Sarah_ai_student
    @Sarah_ai_student Před 2 měsíci

    Your videos are fantastic, offering excellent content. They are probably the best on CZcams for beginners in AI and generative AI.
    I am a student and my school hasn't quite caught up to speed on this subject yet. Could you create a video on how to develop a full-stack chatbot application using Python with Django or another Python framework, incorporating a vector database like Milvus/other, a retrieval-augmented generation (RAG) approach, and a locally hosted language model such as Mistral/other ?
    A Q&A style format would be greatly appreciated.

  • @albertgao7256
    @albertgao7256 Před 2 měsíci

    simple idea but solves real problems, and i would really love to see more about the semantic usage, LLM is been studied too much, but the fundamental embedding model could actually get more used than just providing search for RAG, love your video mate, always high quality!

  • @RichardGetzPhotography
    @RichardGetzPhotography Před 8 měsíci +2

    Building colabs to make James laugh during filming..... priceless!!

  • @jdray
    @jdray Před 8 měsíci

    Watching this with interest. A few months ago I identified a hole in the general field of AI stacks that I conceptually filled with a tool called BRAD (which I've forgotten by now what it stood for). In essence the idea was the same as here, except with a tiny, fast LLM deciding on the route rather than routing logic like you describe here. So thank you for (sort of) implementing my vision. Now I can take this item off my list of things to work on. 😁

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +1

      haha I'm glad we were able to accidentally help out with your vision of BRAD!

  • @nathancanbereached
    @nathancanbereached Před 4 měsíci

    Have you tested precision in situations where there are multiple routing options that have multiple criteria and are mostly similiar? I wonder if there would be benefit to layering 2 or more routing layers in a branching decision tree format. Would need to test- but theoretically you could have the quality of tree of thought decisioning but at lower cost, higher speed, and each decision point could be human-reviewed before pushing to production. I'm glad I found your channel again- definitely worth looking into.

  • @concretec0w
    @concretec0w Před 8 měsíci +1

    Sooooo soooo coool :) This is going straight into my voice assistant so that i don't need multiple keyboard shortcuts to handle different tasks :D

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      haha sounds awesome, hope it goes well :)

  • @deemo16
    @deemo16 Před 8 měsíci

    Awesome work, and great contribution! I have a clunky solution using langchain LLMRouter (which gets me there, but as you pointed out, slow, and a bit awkward). I look forward to implementing this (I can already tell it will be a much cleaner solution than the convoluted logic I've been fumbling with)! Very cool project 🙂

  • @carterjames199
    @carterjames199 Před 8 měsíci

    I love how similar it is to guardrails the semantic search based on preset utterances was amazing when I first saw it in your video last year

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      yeah I do love guardrails, it's a great library and ofcourse it inspired what we're building here

  • @gfertt13
    @gfertt13 Před 8 měsíci +3

    I have been using nemo guardrails for work, but exclusively for the "input rails", which seem to be very similar to this library, with the important difference being that it includes an LLM call to make the final decision of classifying into a route/"canonical form". I'm curious, have you run any tests against a benchmark similar to the way the nemo guardrails paper shows tests against NLU++ benchmark?

    • @gfertt13
      @gfertt13 Před 8 měsíci

      From reading through some of the code it seems like you handle the KNN search with plain numpy and linear search. Any reason for not using a package like FAISS here?

  • @awakenwithoutcoffee
    @awakenwithoutcoffee Před 2 měsíci

    this is incredible James. I have implemented a similar (but less powerful) system inside Botpressand was looking for Langchain intent classification/routes which this seems to cover very well. Are you still using this in production or has Langchain come out with an alternative that you prefer ? keep going my man.

  • @plashless3406
    @plashless3406 Před 8 měsíci +1

    This is really interesting. Will definitely try it out. Great job james and team.

  • @berdeter
    @berdeter Před 8 měsíci +2

    Very interesting. Few questions: how does it compare to Nemo Guardrails you've covered earlier? Have you considered the case where user's question would trigger several semantic routes?
    Let's say you make a bot for candidates going to a job search event in a big company.
    User says "I love AI. Can you propose me something?"
    Route 1 : go to the IT desk
    Route 2 : attend to the conference about digital transformation at 11am
    Route 3: check this job offer that is made fo you
    So the bot should be able to answer on the 3 suggestions.

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +1

      we want to support multiple routes, but it isn't implemented yet - there's nothing in the current methodology that would prevent this though

    • @berdeter
      @berdeter Před 8 měsíci

      @@jamesbriggs I have implemented multiple semantic routes on a project. I couldn't find a good framework so it's all hand made.
      I work on a list of intentions such as I want to attend to a conference about digital transformation
      I perform a semantic search and I have 3 thresholds:
      Never more than 3 intentions selected
      Never select intention under a cosine similarity of 82%
      If I find an intention with cousine similarity above 91% I keep only that one and disregard the rest.
      Percentages come from experimentation.
      I think that would be a good starting point to implement something generic in your framework.

  • @ahmadzaimhilmi
    @ahmadzaimhilmi Před 8 měsíci

    This has a good use case in research papers if it can be restructured to categorize sentences into type such as result, challenges, methods etc.

  • @BradleyKieser
    @BradleyKieser Před 8 měsíci

    Very interesting idea, I have been using a similar idea but with a light weight fast locally hosted LLM for the routing decision (your "RAG" technology adds a very good layer of speed and precise control). Weird hearing "route" ("root") pronounced as "rowt". Very Game of Thrones. Guess it's for American viewers. Excellent presentation, clear explanation and very well thought out overview.

  • @yoshkebab
    @yoshkebab Před 8 měsíci +3

    Very interesting. How do you handle history and context? A lot of times a single prompt can't be categorizes on it's own, and routing will change according to context. I'm curious about your approach to this.

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +1

      for now when using the suggestive method I demo in the video the LLM makes the final decision on what to do, so we get around the issue of implementing support for chat history and just use a single query
      In any case, I 100% agree and adding support for chat history is our next priority, we have already built the methodology for how we will handle it, which you can see here github.com/aurelio-labs/cookbook/blob/main/semantic-analysis/semantic-topic-change.ipynb

  • @user-fs5lb3ce3b
    @user-fs5lb3ce3b Před 8 měsíci

    oh, I've been dreaming about this possibility; a kind of dynamic workflow engine that allows the LLM to choose amongst plausible decision paths... will watch

  • @nicolaithomsen7005
    @nicolaithomsen7005 Před 7 měsíci

    As always, mind-blowing. Great video, James. My favorite GenAI educator!

  • @dusanbosnjakovic6588
    @dusanbosnjakovic6588 Před 8 měsíci +3

    Do you have any stats on accuracy differences between this and LLM? Especially in longer queries.

  • @nikosterizakis
    @nikosterizakis Před 6 měsíci

    That is a great piece of development James. Certainly a very useful 'brick in the wall' of the LLM ecosystem and something we will be using in future projects. Out of interest, are you guys going to branch out to the other two gen AI areas, namely video and audio?

    • @jamesbriggs
      @jamesbriggs  Před 6 měsíci +1

      For sure, we're already taking some steps into multi-modal, for example using semantic router we're already doing:
      - Image detection czcams.com/video/EqKjaLrpeI4/video.html
      - Video processing github.com/aurelio-labs/semantic-router/blob/main/docs/examples/video-splitter.ipynb

    • @nikosterizakis
      @nikosterizakis Před 6 měsíci

      Great stuff, with permission I am going to test drive that python code!

  • @mooksha
    @mooksha Před 8 měsíci +1

    Very useful, looking forward to more videos on this! Will explore the repo as well. How much work has gone into optimising testing for large number of semantic routes? Is it still fast if there are 200 routes with 5 utterances in each? Also how does it fare if routes have many utterances, like 100 each?

  • @scharlesworth93
    @scharlesworth93 Před 8 měsíci +1

    Cool I gotta wait two days to watch

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      I need some time to get everything ready 😅 will not make you wait for following videos :)

  • @KarlJuhl
    @KarlJuhl Před 8 měsíci

    Great development, thanks for sharing.
    Personally I have been using LLMs for an intent detection step, to route a user query to a given prompt that has the relevant context to answer a user question.
    I'm interested to see the latency of this in the same setup. Makes total sense to use vector space to cluster a user query and route to the most similar intent in the space.

  • @MidtownAI-qi1yq
    @MidtownAI-qi1yq Před 8 měsíci +1

    With react prompting, you can trigger a sequence of actions. How would semantic router handler the same processing ? It appears to me that everything needs to be programmed in and anticipated instead of expecting a "reasoning engine" which can build in theory any workflow based on the available tools. Just thinking about the limitations of this approach. Very creative though!

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      yes it would require a lot of logic build around it to support multiple steps, I think it would be doable, especially if you have differing sets of routes that could enter the "decision space" but it would be nontrivial with current lib - I think we could build some interesting tooling to attempt to solve for this use-case

  • @storytimewithme2
    @storytimewithme2 Před 5 měsíci

    Getting Thailand vibez off this guy

  • @alivecoding4995
    @alivecoding4995 Před 5 měsíci

    I love your work and content! Thanks so much, James. :)

  • @mustafadut8430
    @mustafadut8430 Před 7 měsíci

    Man in nice shirt. I can't wait to see an example of it collaborating with a Langchain chatbot.

  • @lorenzospataro26
    @lorenzospataro26 Před 5 měsíci

    Have you thought about using Setfit as an alternative to pure semantic matching? If you have a small dataset to train the intent classifier, it would probably perform better for some use cases (at the cost of being a bit slower but still faster than LLM)

  • @jakobkristensen2390
    @jakobkristensen2390 Před 28 dny

    Super cool project

  • @marcomorales9417
    @marcomorales9417 Před 6 měsíci

    Hey! I've found using rules within the system initial message works great for filtering questions about unrelated topics to the domain the rag chtabot should answer, maybe the win with this approach is reducing cost and time? Also I've been playing around with having 1 chatbot which can have different phases within a conversation which are set by different system messages, for example stage 1 is to extract customer infromation fo obtain a lead and then help them with the question, this information can be outputed by the LLM through a JSON response. Quite interesting

  • @GiovanneAfonso
    @GiovanneAfonso Před 8 měsíci

    Incredible work! Thank you for sharing, I'm really excited right now, I'll give a try.
    Could you answer some questions?
    - Is it possible to have a 1000 different routes? (I'm just curious)
    - Is it possible to cache / reuse embeddings in multiple services?
    - Is it using an AI model under the hood for choosing the right router?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +3

      Great to hear!
      - yes it could have 1000 different routes in theory, I have not tested yet
      - we don’t cache the embedding yet but we are adding it to the RouteLayer.to_file method
      - the default RouteLayer uses vector space and embeddings to create what is essentially an inherent classification model

    • @GiovanneAfonso
      @GiovanneAfonso Před 8 měsíci

      @@jamesbriggs you guys are making the future more shiny. Thanks

  • @danielvalentine132
    @danielvalentine132 Před 8 měsíci

    Brilliant. Simple and elegant solution.

  • @avg_ape
    @avg_ape Před 5 měsíci

    Fantastic contribution. Thank you.

  • @johnny017
    @johnny017 Před 8 měsíci

    Very cool! I built something similar for a project, but much simpler. I also agree that, for now, it doesn't make much sense to use agent reasoning in production. It takes too much time to output, and it consumes too many tokens. I will keep an eye on the repo and try to contribute!

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      I think there are times for agent reasoning, but it is certainly overused - and I believe we will improve semantic-router to replace more of those more expensive+slow reasoning steps with more efficient methods
      We'd love to have you contribute!

  • @narutocole
    @narutocole Před 8 měsíci

    Super excited to try this!

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      hey Jordan! Let me know how it goes!

  • @SimonMariusGalyan
    @SimonMariusGalyan Před 7 měsíci

    Great work which can be integrated into apps speeding up data processing and reducing hallucinations… 🎉

  • @tajwarakmal
    @tajwarakmal Před 8 měsíci

    This is fantastic! already have a few use cases in mind.

  • @avatarelemental
    @avatarelemental Před 8 měsíci

    This sounds great ! I will give it a try

  • @socalledtwin
    @socalledtwin Před 8 měsíci

    For now, it could be useful for selecting the best RAG pipeline or collection to search through, though it could be the case that future models like GPT-5 are so good at determining the correct function to call or collection to search, it won't be needed. Either way, nice work.

  • @megamehdi89
    @megamehdi89 Před 8 měsíci

    Key Insights by TubeOnAI:
    1. Semantic Router Introduction: The Semantic Router is presented as a crucial layer for achieving control and determinacy in AI dialogue. It is described as a fast decision-making layer for natural language processing, enabling instant triggering of specific responses based on predefined queries. The speaker emphasizes its significance in refining the behavior of AI assistants and chatbots, stressing its necessity for deploying such systems.
    2. Library Setup and Integration: The video provides a step-by-step guide on setting up and using the Semantic Router library. It introduces the installation process and demonstrates how to define routes and test their interaction. The integration with an AI assistant is showcased, illustrating how the Semantic Router augments user queries and influences the agent's responses based on predefined routes.
    3. Enhancing AI Dialogue and Control: The speaker highlights the capabilities of the Semantic Router in influencing AI behavior, protecting against unwanted queries, and suggesting specific actions or information to the AI assistant. The framework is portrayed as a tool for not only steering dialogue but also enhancing the AI's decision-making process and overall functionality.
    4. Future Developments and Open Source Collaboration: The video concludes with an outlook on future developments, expressing the speaker's excitement about the potential of the Semantic Router framework. The open-source nature of the project is emphasized, encouraging contributions and promising further insights into advanced features such as dynamic routing and the hybrid layer.

  • @musifmuzammir354
    @musifmuzammir354 Před 8 měsíci +3

    Isn't this how RASA framework works?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +1

      RASA does intent detection, and that is to some degree what we're doing here - to be honest I have not used RASA for years so I don't know where they are now, but it began to get quite dated - I unfortunately don't know enough to compare their recent versions to this, I will investigate though

    • @musifmuzammir354
      @musifmuzammir354 Před 8 měsíci

      @@jamesbriggs It still does the intent classification only. But this seems like a faster approach, finding the correct intent using embedding search.

    • @familyaccount-eb7cb
      @familyaccount-eb7cb Před 4 měsíci

      Rasa started move away from intent based routing. Because it doesn’t handle follow up inputs well. (Input that needs previous messages context). I do wonder how semantic router handle this follow up inputs

  • @wolpumba4099
    @wolpumba4099 Před 8 měsíci

    *Summary*
    *Introduction to Semantic Router*
    - 0:00 - Introduction to the concept of a semantic router as a key component in building AI assistants.
    - 0:24 - Definition and purpose of a semantic router in AI dialogue systems.
    *Working Mechanism of Semantic Router*
    - 0:42 - Semantic router acts as a fast decision-making layer for language models.
    - 1:00 - The deterministic setup of the semantic router through query-response mapping.
    - 1:33 - Personal experience of using semantic routers in chatbots and agents.
    *Setting Up the Semantic Router*
    - 2:04 - Guide to accessing and installing the semantic router library.
    - 2:32 - Explanation of the library installation process and version details.
    - 3:00 - Steps to restart session post-installation in Google Colab.
    - 3:11 - Creating and testing sample routes for the semantic router.
    *Practical Examples and Usage*
    - 4:07 - Demonstration of initializing embedding models for the router.
    - 4:52 - Introduction to different types of route layers in the library.
    - 5:46 - Testing and interpreting the output of the semantic router with various queries.
    - 7:00 - Example of using the semantic router to control dialogue topics (e.g., politics).
    *Integration with AI Agents*
    - 7:42 - Demonstrating the integration of the semantic router with an AI agent.
    - 8:03 - Enhancing agent responses using semantic router augmented queries.
    - 10:07 - Various applications of the semantic router in customizing agent interactions.
    *Conclusion and Future Developments*
    - 12:43 - Reflections on the implementation and effectiveness of the semantic router in projects.
    - 13:01 - Acknowledging the early stage of the semantic router but emphasizing its effectiveness.
    - 13:30 - Invitation for community involvement and future instructional content on advanced features.
    *Closing Remarks*
    - 14:12 - Concluding thoughts and anticipation for future developments and community engagement.

  • @cuburtrivera1167
    @cuburtrivera1167 Před 7 měsíci

    i think this can be replicated by just using output parser, but this abstraction layer makes it easier

  • @mrchongnoi
    @mrchongnoi Před 8 měsíci

    Thank you for the video. Very useful

  • @gabrieleguo
    @gabrieleguo Před 8 měsíci

    Very interesting framework. What if the utterances are in another language? Does it still work well? I guess it should be matching what the encoder supports, is it correct?

  • @carterjames199
    @carterjames199 Před 8 měsíci

    This is really cool, can the semantic router or do you have plans to allow the use of open source embedding models or maybe like local mini lm models?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      yes you can already, we added it this weekend - see here github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb
      it works incredibly well, using mistral 7b we get better performance on the few tests I did than gpt-3.5

  • @mathiasschrooten903
    @mathiasschrooten903 Před 2 měsíci

    Could this, if so, be integrated with LangGraph? This seems like the perfect hybrid solution!

  • @dawid_dahl
    @dawid_dahl Před 8 měsíci

    Just so I can understand... do you mean that just by 1) providing some example of those various sentences "isn't politics the best thing ever", etc, and 2) user's embedded query - it will give back a route, without ever going to an actual LLM to make the routing decision?

  • @DemetrioFilocamo
    @DemetrioFilocamo Před 8 měsíci

    Great project and thanks for open sourcing it! What’s the difference with Nemo Guardrails?

  • @marktucker8537
    @marktucker8537 Před 6 měsíci +1

    How does Semantic Router work under the hood?

  • @robcz3926
    @robcz3926 Před 3 měsíci

    man this is so much easier than langgraph routing, and it works with smaller models too

    • @jamesbriggs
      @jamesbriggs  Před 3 měsíci +1

      yeah we use both in projects, but I'm planning to try building agents that rely wholly on semantic routes soon

    • @robcz3926
      @robcz3926 Před 3 měsíci

      @@jamesbriggs looking forward to that mate🤘

  • @dgroechel
    @dgroechel Před 8 měsíci

    Great video. With function calling, how do you generate the arguments and keep the speed?

  • @micbab-vg2mu
    @micbab-vg2mu Před 8 měsíci

    Graet video - thank you. I will try it.

  • @franciscocaruso4458
    @franciscocaruso4458 Před 8 měsíci

    Hey great video, and very interesting tool!! This looks very simmilar to NeMo guardrails. What is the difference in this case?

  •  Před 8 měsíci

    Thanks James, this is a very good solution for us moving forwards to using agents instead of chains.
    How do you see this fits with more advanced rags, that have Query preprocessing, reranking etc.

  • @JanVansteenlandt
    @JanVansteenlandt Před 5 měsíci

    What I'm wondering is how to best use this when dealing with chat history. For example if you ask a political question followed up by "please elaborate". The message itself does not mean anything, however taking the previous question into account does... is it as simple as just concatenating the last X user questions and using that as a basis for the routing input? Giving the routing input a limited "memory" but memory nonetheless.

  • @thebozbloxbla2020
    @thebozbloxbla2020 Před 8 měsíci

    hey man, really weird question but i hope you respond. i was wondering if the routes we created got saved in some sort of database, aka all the utterances and what not. or are you just vectorising the utterances each time you open a new instance?
    can i define a route for the long term?

  • @codingcrashcourses8533
    @codingcrashcourses8533 Před 8 měsíci

    I don´t really like the LangChain integration, accessing attributes like this and overwriting prompts like this: agent.agent.llm_chain.prompt = new_prompt :(. I still hope for a build in LangChain functionality for something like this. Thanks for the demonstration. It is still a small and young project.

  • @llaaoopp
    @llaaoopp Před 8 měsíci +1

    How does this differ from NeMo Guardrails? I thought this pre-check of your query against a database of semantically embedded utterances is at the heart of how Guardrails functions.

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +4

      Yes, I'm very familiar with the lib - it is what we used before semantic router and I and others from aurelioAI made a few contributions to make the lib easier to develop with and deploy in projects, but ultimately it was limiting + overly complex.
      To add things like dynamic routing, hybrid layers, etc (I will talk about these soon) was too difficult. We also have other upcoming features such as a topics-based conversation splitter that we believe will be key to getting the next level of performance from this type of approach. Again, implementing in guardrails would have been more complex than developing it independently, and possibly even out of scope for what they're building.
      Although I think guardrails is awesome, and it served as the starting point for what we're building here, I ultimately felt it better to move away from the library, that may change if nvidia decide to put more resources into it, but I haven't seen this happen yet.

    • @llaaoopp
      @llaaoopp Před 8 měsíci

      @@jamesbriggs
      I completely agree, Guardrails felt a bit clunky if you wanted to actually code around it and this it honestly felt like it was not "built for the job" really, but more like a PoC. I love the direction that you guys are going with this and can't wait what other concepts you came up with around the concept of semantic routing!
      Cheers :)

  • @plashless3406
    @plashless3406 Před 8 měsíci +1

    I have one question though: in order to make the most out of a route, do we need to have many utterances to cover the whole use case of a route? I mean what if a usee query belngs to a route and it return no matching route?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      You need to cover the routes well, if I see a query miss a route, I usually just add it directly to the route and iteratively do that, adding queries that were missed

    • @plashless3406
      @plashless3406 Před 8 měsíci

      @@jamesbriggs this really is interesting and promising.

  • @TommyJefferson1801
    @TommyJefferson1801 Před 8 měsíci

    So basically, this avoids additional time during inference if I'm not wrong.
    Also why not finetune LLM and use functional calling? I mean yes it can take some time but how well does this Approach compare to that in production scenarios? Do we have like benchmarks on this?

  • @Truzian
    @Truzian Před 8 měsíci +1

    Would this every be supported in TS/JS?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      would love to build it, we do have some great TS/JS devs, so it's quite likely

  • @luisliz
    @luisliz Před 8 měsíci

    this is awesome is this similar to guidance-ai? sorry if im confusing concepts

  • @parchamgupta8417
    @parchamgupta8417 Před 8 měsíci

    Hi just had a thought in mind, dont you think dialogflow does this kind of task already although at a much basic level.

  • @scharlesworth93
    @scharlesworth93 Před 8 měsíci

    This reminds me of that Nemo rails thing you were talking about a while back, is this the preferred tool?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      for me yes, we built this due to limitations we were seeing with nemo (although, nemo is still a great tool)

  • @crotonium
    @crotonium Před 8 měsíci

    Isn't this just calculating the cosine similarity between the input query, and the mean of each route class to categorize which route it belongs to?

  • @thebozbloxbla2020
    @thebozbloxbla2020 Před 8 měsíci

    was wondering if we could use the semantic search titles and use it for index searching vector databases to return similarity results more related to the topic...

  • @onufriienko
    @onufriienko Před 8 měsíci

    Thanks for sharing James,
    Are there any examples with LlamaIndex? Thanks 😊

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      not yet, will be working on creating examples w/ different libs over the coming weeks

  • @naromsky
    @naromsky Před 8 měsíci

    The name is a banger.

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      I wish I could take credit, but it's all the aurelioAI team, my ideas were terrible and fortunately not chosen 😅

  • @eyemazed
    @eyemazed Před 7 měsíci

    interesting. how do you determine a threshold for whether a user query belongs to a certain route?

    • @jamesbriggs
      @jamesbriggs  Před 7 měsíci +1

      A default value is set based on the encoder being used at the moment, I want to add route specific thresholds and auto optimization of those soon

  • @JulianHarris
    @JulianHarris Před 8 měsíci

    It’d be quite fun to integrate this with the ollama web ui project.

  • @caiyu538
    @caiyu538 Před 8 měsíci

    Great great. Great

  • @sangyeonlee5417
    @sangyeonlee5417 Před 8 měsíci

    Wow super excited, how about in case of using different language like CjK ?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +1

      we support Cohere embedding models, and they do have multilingual support, so you would initialize our CohereEncoder using `CohereEncoder(name="embed-multilingual-v3.0")` and that comes with support for CJK languages as far as I know :)

    • @sangyeonlee5417
      @sangyeonlee5417 Před 8 měsíci

      Tanks alot i will try it. @@jamesbriggs

  • @areebtariq6755
    @areebtariq6755 Před 5 měsíci

    If we are using embeddedings for this ? Where do the system store the embeddings ?
    Or it is as the run time that they generate those ?

    • @jamesbriggs
      @jamesbriggs  Před 5 měsíci

      Using the local index it will be rebuilt with each session - however we support Pinecone and qdrant index, which will maintain the embeddings, video on that here czcams.com/video/qjRrMxT20T0/video.html

  • @georgegowers4037
    @georgegowers4037 Před 8 měsíci

    Is this the same as the planner in Semantic Kernel SDK?

  • @andydataguy
    @andydataguy Před 8 měsíci

    BigAI p100 whey protein 😂 love to see the sense of humor!

  • @adsk2050
    @adsk2050 Před měsícem

    How is this different from langgraph?

  • @Kalebryee
    @Kalebryee Před 8 měsíci

    How is this different from guard-rail

  • @roberth8737
    @roberth8737 Před 8 měsíci

    Passing in just the query could often lack context - requiring the usual "create a standalone question.. etc etc" so that statements are correctly interpreted. Although this could be done with a quick 3.5 query, could we somehow combine the past X queries semantically without resorting to LLMs to solve for those cases?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      We’re working on it, expecting to have a v1 feature for identifying most relevant messages required for a query by next week

  • @merefield2585
    @merefield2585 Před 7 měsíci

    OK, but isn't this theoretically equivalent to feeding function prompts to catch certain user behaviour? I've already written an escalation function on my chatbot that responds to anger or frustration and is written as a "pure" function prompt. The advantage of that approach is I don't have to maintain another layer of logic. Presumably Open AI is using semantic similarity thresholds to trigger functions, so the approach here is not a lot different? Could you elaborate on the advantages and any disadvantages of doing this locally? On the politics issue, why not just add to the system prompt something like "you must NEVER discuss politics and decline to do so politely. "?

  • @Bubbalubagus
    @Bubbalubagus Před 8 měsíci

    How does this differ from NLU inference architecture?

  • @eyemazed
    @eyemazed Před 8 měsíci

    i built a RAG for our inhouse project management system. it works fine when users write a prompt like "when was X topic opened and who started it" for example, because it performs vector search on "X" and then creates a context around it. however, it does not work for a prompt like "who's the newest registered user?" because that cannot be retrieved by vector search. it can however be retrieved via database. anyone knows how to solve this? given that LLM can in fact write DB queries... i'm thinking some sort of routing should be implemented for this as well

    • @towards_agi
      @towards_agi Před 7 měsíci

      Built something similar but was using a formula 1 dataset. Used a graph db to do multi hop queries.
      E.g. I asked it when a particular driver won their first race. Based on the graph schema the LLM would generate a query that linked the driver and all their races. Then filters by races were the driver got 1st and returns them in ascending order.
      Let me know if you need assistance.

  • @shaunpx1
    @shaunpx1 Před 7 měsíci

    Can this work with triggering function calls when some response or simular response(utterance is detected)?

    • @jamesbriggs
      @jamesbriggs  Před 7 měsíci

      yeah 100%, I do exactly that here czcams.com/video/NGCtBFjzndc/video.html

  • @zongguixie3407
    @zongguixie3407 Před 3 měsíci

    Uvula oum😊

  • @pascalshehata7648
    @pascalshehata7648 Před 8 měsíci

    Hey, isn't it just another name for intents?

  • @avidlearner8117
    @avidlearner8117 Před 8 měsíci

    Isn't that close to what Constitutional AI, by Anthropic, does? I can see your solution has layers of functionality that the other method doesn't have...

  • @smoq20
    @smoq20 Před 8 měsíci

    Is there a way to make it 100% local without relying on OpenAI or Cohere? ChromaDB?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +1

      Fully local support in progress, doesn’t need chroma

  • @landerstandaert6649
    @landerstandaert6649 Před 8 měsíci

    Similar Microsoft copilot studio

  • @Mr_Arun_Raj
    @Mr_Arun_Raj Před 8 měsíci

    Can I use hugging face embeddings?

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      yes they were added a couple days ago github.com/aurelio-labs/semantic-router/pull/90

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci

      can see an example notebook here github.com/aurelio-labs/semantic-router/blob/main/docs/encoders/huggingface.ipynb

  • @larsbell1569
    @larsbell1569 Před 8 měsíci

    what if it happens to contain both?
    dl("The rainy weather we are having today reminds me of the Prime Minister who is a damp dull man.")

  • @altered.thought
    @altered.thought Před 7 měsíci

    Are there plans to support JavaScript in the future? 🙃

  • @seel1823
    @seel1823 Před 8 měsíci

    Does it work with GPT4? - Ik im asking before it even starts lol

    • @jamesbriggs
      @jamesbriggs  Před 8 měsíci +1

      yes it can haha, it's a separate layer, so it works with anything you want it to work with :)

    • @seel1823
      @seel1823 Před 8 měsíci +1

      One last question @@jamesbriggs can you use it with RAG?

  • @whiskeycalculus
    @whiskeycalculus Před 8 měsíci

    Accelerate

  • @chillydoog
    @chillydoog Před 8 měsíci

    No offense, but I don't really see the point of this semantic routing technique? I also don't think I fully understand what it is. why wouldn't you just use a chat GPT assistant? It seems like it's doing the same thing, but probably better. IDK feel free to correct me or help me understand with the value and this is.