- 187
- 592 441
TechViz - The Data Science Guy
India
Registrace 27. 07. 2020
It is aptly said, “The best part of learning is sharing what you know”.
Hello there, This channel is my passion project. Here I plan to touch upon various topics in the AI/ML space with a major focus on Natural Language Processing (NLP), Graph Machine Learning, and General Machine Learning/Deep Learning concepts. You can find quite a few interesting formats for people with different attention spans such as Research Paper Walkthroughs, YT Shorts, and so on.
Make sure you Subscribe to the channel and Share out the content with your friends :)
Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.
*********************************************
If you want to support me financially which is totally optional and voluntary :) ❤️
You can consider buying me chai ( because I don't drink coffee :) ) at www.buymeacoffee.com/TechvizCoffee
Support using Paypal - www.paypal.com/paypalme/TechVizDataScience
Hello there, This channel is my passion project. Here I plan to touch upon various topics in the AI/ML space with a major focus on Natural Language Processing (NLP), Graph Machine Learning, and General Machine Learning/Deep Learning concepts. You can find quite a few interesting formats for people with different attention spans such as Research Paper Walkthroughs, YT Shorts, and so on.
Make sure you Subscribe to the channel and Share out the content with your friends :)
Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.
*********************************************
If you want to support me financially which is totally optional and voluntary :) ❤️
You can consider buying me chai ( because I don't drink coffee :) ) at www.buymeacoffee.com/TechvizCoffee
Support using Paypal - www.paypal.com/paypalme/TechVizDataScience
What is a Masked Language Model (MLM) ?
#bert #naturallanguageprocessing #transformers
A Masked Language Model (MLM) is a neural network-based language model that's trained to predict missing words in a text. It's a pre-training technique used in natural language processing (NLP) to help AI models understand language structure and context
⏩ IMPORTANT LINKS
Research Paper Summaries: czcams.com/video/ykClwtoLER8/video.html
Enjoy reading articles? then consider subscribing to Medium membership, it is just 5$ a month for unlimited access to all free/paid content.
Subscribe now - prakhar-mishra.medium.com/membership
*********************************************
⏩ CZcams - czcams.com/channels/oz8NrwgL7U9535VNc0mRPA.html
⏩ LinkedIn - linkedin.com/in/prakhar21
⏩ Medium - medium.com/@prakhar.mishra
⏩ GitHub - github.com/prakhar21
*********************************************
⏩ Please feel free to share out the content and subscribe to my channel - czcams.com/channels/oz8NrwgL7U9535VNc0mRPA.html?sub_confirmation=1
Tools I use for making videos :)
⏩ iPad - tinyurl.com/y39p6pwc
⏩ Apple Pencil - tinyurl.com/y5rk8txn
⏩ GoodNotes - tinyurl.com/y627cfsa
#techviz #datascienceguy #deeplearning #ai #openai #chatgpt #machinelearning #recommendersystems #CustomerServiceTechnicalSupport #EfficientlyResolvingCustomerInquiries #RetrievalAugmentedGeneration #LargeLanguageModels #IssueTrackingTickets
#CustomerServiceQuestionAnswering #KnowledgeGraphRetrieval
About Me:
I am Prakhar Mishra and this channel is my passion project. I am currently pursuing my MS (by research) in Data Science. I have an industry work-ex of 4+ years in the field of Data Science and Machine Learning with a particular focus on Natural Language Processing (NLP).
A Masked Language Model (MLM) is a neural network-based language model that's trained to predict missing words in a text. It's a pre-training technique used in natural language processing (NLP) to help AI models understand language structure and context
⏩ IMPORTANT LINKS
Research Paper Summaries: czcams.com/video/ykClwtoLER8/video.html
Enjoy reading articles? then consider subscribing to Medium membership, it is just 5$ a month for unlimited access to all free/paid content.
Subscribe now - prakhar-mishra.medium.com/membership
*********************************************
⏩ CZcams - czcams.com/channels/oz8NrwgL7U9535VNc0mRPA.html
⏩ LinkedIn - linkedin.com/in/prakhar21
⏩ Medium - medium.com/@prakhar.mishra
⏩ GitHub - github.com/prakhar21
*********************************************
⏩ Please feel free to share out the content and subscribe to my channel - czcams.com/channels/oz8NrwgL7U9535VNc0mRPA.html?sub_confirmation=1
Tools I use for making videos :)
⏩ iPad - tinyurl.com/y39p6pwc
⏩ Apple Pencil - tinyurl.com/y5rk8txn
⏩ GoodNotes - tinyurl.com/y627cfsa
#techviz #datascienceguy #deeplearning #ai #openai #chatgpt #machinelearning #recommendersystems #CustomerServiceTechnicalSupport #EfficientlyResolvingCustomerInquiries #RetrievalAugmentedGeneration #LargeLanguageModels #IssueTrackingTickets
#CustomerServiceQuestionAnswering #KnowledgeGraphRetrieval
About Me:
I am Prakhar Mishra and this channel is my passion project. I am currently pursuing my MS (by research) in Data Science. I have an industry work-ex of 4+ years in the field of Data Science and Machine Learning with a particular focus on Natural Language Processing (NLP).
zhlédnutí: 85
Video
SimCSE - Unsupervised Sentence Embeddings
zhlédnutí 84Před 14 dny
#sentencetrasformers #unsupervisedlearning #naturallanguageprocessing In this video, we discuss the paper SimCSE - An unsupervised learning method for learning Sentence Embeddings. SimCSE is a contrastive learning framework for generating sentence embeddings. It utilizes an unsupervised approach, which takes an input sentence and predicts itself in contrastive objective, with only standard drop...
TSDAE - Unsupervised Training of Sentence Embeddings
zhlédnutí 138Před 28 dny
#sentencetrasformers #unsupervisedlearning #naturallanguageprocessing In this video, we discuss the paper TSDAE - An unsupervised learning method for learning Sentence Embeddings or representations. ⏩ IMPORTANT LINKS Research Paper Summaries: czcams.com/video/ykClwtoLER8/video.html Enjoy reading articles? then consider subscribing to Medium membership, it is just 5$ a month for unlimited access...
Understanding Precision@K and Recall@K Metrics
zhlédnutí 246Před měsícem
#recommendations #machinelearning #evaluation Precision at k (P@k) and Recall at k (R@k) are metrics used in information retrieval and machine learning to evaluate the performance of a ranking model or system. These metrics help understand how well the top k results are performing in relevance. Research Paper Summaries: czcams.com/video/ykClwtoLER8/video.html Enjoy reading articles? then consid...
Proposition-Based Retrieval Explained!
zhlédnutí 263Před měsícem
#rag #genai #llms Dense retrieval has become a prominent method to obtain relevant context or world knowledge in open-domain NLP tasks. When we use a learned dense retriever on a retrieval corpus at inference time, an often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We discover that the retrieval unit choice significantly ...
Why do we need Positional Encoding in Transformers?
zhlédnutí 233Před měsícem
#transformers #positionalencodings #naturallanguageprocessing Transformers unlike LSTMs do not inherently account for the order of tokens in a sequence, positional encodings provide a way for the model to understand the position of each token, which is crucial for many tasks such as natural language processing. In this video, we try to answer the Why, What, and Where of Positional Encodings in ...
Retrieval-Augmented Generation with Knowledge Graphs for Customer Support Q/A (Paper Summary)
zhlédnutí 436Před měsícem
#rag #knowledgegraph #customersupport #machinelearning #llms #naturallanguageprocessing In customer service technical support, swiftly and accurately retrieving relevant past issues is critical for efficiently resolving customer inquiries. The conventional retrieval methods in retrieval augmented generation (RAG) for large language models (LLMs) treat a large corpus of past issue tracking ticke...
Improving your RAG system with Self Querying Retrieval
zhlédnutí 663Před 2 měsíci
#genai #rag #machinelearning Self-query retriever is used to convert an unstructured query into a structured query and then apply structured query to a vector store to get relevant results. This method can help you improve your RAG performance and have a high-performing RAG pipeline. Research Paper Summaries: czcams.com/video/ykClwtoLER8/video.html Enjoy reading articles? then consider subscrib...
Improving Document Re-ranking with Zero-Shot Question Generation (Paper Summary)
zhlédnutí 531Před 2 měsíci
#rag #llms #informationretrieval We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of a...
Understanding Reciprocal Rank Fusion in Hybrid Search [Advanced RAG]
zhlédnutí 856Před 2 měsíci
#ragsystem #hybridsearch #genai #llm Reciprocal Rank Fusion (RRF) is an effective and simple method for combining multiple ranked lists of search results into a single, more accurate ranked list. The main idea behind RRF is to give higher ranks to documents that appear near the top of any of the input lists, thus rewarding consensus among the different lists. Research Paper Summaries: czcams.co...
API Design - 3 common Pagination Strategies
zhlédnutí 267Před 6 měsíci
#apidevelopment #softwaredevelopment #api In API design, pagination is a technique used to manage and limit the amount of data returned by an API endpoint. When dealing with large sets of data, it's not efficient or practical to return the entire dataset in a single response. Pagination allows the API to provide a subset or "page" of the data, making it more manageable for clients to retrieve a...
Blueprint for Designing Machine Learning Systems
zhlédnutí 876Před 6 měsíci
Blueprint for Designing Machine Learning Systems
LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking
zhlédnutí 2,6KPřed 7 měsíci
LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking
Large Language Models (LLMs) for Recommendations (Paper Walkthrough)
zhlédnutí 4,5KPřed 7 měsíci
Large Language Models (LLMs) for Recommendations (Paper Walkthrough)
Document Re-ranking using LLMs - Advanced RAG
zhlédnutí 6KPřed 7 měsíci
Document Re-ranking using LLMs - Advanced RAG
Introducing PandasAI: Generative AI Python Library
zhlédnutí 7KPřed 7 měsíci
Introducing PandasAI: Generative AI Python Library
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
zhlédnutí 392Před 8 měsíci
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves
Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models
zhlédnutí 1,5KPřed 8 měsíci
Self-Improving for Zero-Shot Named Entity Recognition with Large Language Models
MemPrompt: Memory-assisted prompt editing to improve GPT-3 after deployment
zhlédnutí 236Před 8 měsíci
MemPrompt: Memory-assisted prompt editing to improve GPT-3 after deployment
Zero-Shot Next-Item Recommendation using Large Pretrained Language Models
zhlédnutí 1,7KPřed 8 měsíci
Zero-Shot Next-Item Recommendation using Large Pretrained Language Models
Advanced RAG Concept: Improving RAG with Multi-stage Document Reranking
zhlédnutí 5KPřed 8 měsíci
Advanced RAG Concept: Improving RAG with Multi-stage Document Reranking
Improving Language Model Reasoning with Contrastive Chain-of-Thought Prompting
zhlédnutí 750Před 9 měsíci
Improving Language Model Reasoning with Contrastive Chain-of-Thought Prompting
Improving RAG performance with Parent Context Retriever
zhlédnutí 599Před 9 měsíci
Improving RAG performance with Parent Context Retriever
Understanding RAG: Basics, Challenges, and Improvements
zhlédnutí 1,5KPřed 9 měsíci
Understanding RAG: Basics, Challenges, and Improvements
PDF Document Question Answering with ChatGPT #demo
zhlédnutí 9KPřed rokem
PDF Document Question Answering with ChatGPT #demo
GPT-3 Fine-Tuning Made Easy: No Coding Required!
zhlédnutí 2,7KPřed rokem
GPT-3 Fine-Tuning Made Easy: No Coding Required!
DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarisation (Paper Summary)
zhlédnutí 972Před rokem
DialogLM: Pre-trained Model for Long Dialogue Understanding and Summarisation (Paper Summary)
Frustratingly Easy Model Ensemble for Abstractive Summarization (Research Paper Walkthrough)
zhlédnutí 728Před rokem
Frustratingly Easy Model Ensemble for Abstractive Summarization (Research Paper Walkthrough)
GPTZero: Hero or Zero in Detecting AI Generated Text?
zhlédnutí 3,1KPřed rokem
GPTZero: Hero or Zero in Detecting AI Generated Text?
I built GPT-3 powered Excel for analysing Amazon Reviews
zhlédnutí 1,2KPřed rokem
I built GPT-3 powered Excel for analysing Amazon Reviews
Great explaination.
Thank you Ravi :)
Gotzero sucks. It says things i typed myself are AI. Its a scam
You are really doing good job! this is going to help alot of ppl
Thank you :)
thank you for such lucid explanation
Thank you :)
To @everyone: You have to watch this video, it's pretty cool 😄 such a nice job explaining even the minor details. Well done!
Thank you 🤩
sir can you suggest some of the best ranking rag algorithms like to build ai freelancer matcher in which I will rank the gigs based on the description, ratings, number of review. How can I approach this.
You are a true TechViz man...love the simplicity that you explain with.
Thank you. I am glad 😁
Really appreciated the effort, please continue the paper deep dives!
Thank you so much. Lot more already scheduled 👏
Please share the pdf link of this paper
Hey, in the description ⬆️
List of unsupervised sentence embedding methods: github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning
Well explained, Thanks !
Thank you!
Thanks. Can donut be used for text region detection such as caption, oage number, serial number and classifiying them?
You could train it to specifically extract required entries from pdf. Making it simulate like region specific extracts
%AI improve and develop our contextual strand adequacy interpretation 2024 to %AI improve and develop our contextual strand adequacy interpretation 2024
GPT generated ?
Awresome man! Keep doing this stuff. Subscribed!!
Thanks man! 👏
Great video, thanks for making it.
Thanks Ravi !
NDCG Metric: czcams.com/video/xHhLOQ7Pnb4/video.html Mean Reciprocal Rank Metric: czcams.com/video/6dDvfGrxFns/video.html
Your explanation of T2 becoming 0 with sigmoid is incorrect.
Oh okay 🤔 maybe I might have missed something. I clearly don’t remember the content of this one now. It would be really helpful, if you could point the error with timestamp and its alternate explanation, it will really benefit anyone who sees it from here on. I’ll also pin the comment to prioritise its visibility. 🙏 thank you
Thanks for sharing this, as always! Interesting approach, but based on their results, it does not seem to provide significant improvement (besides SimCSE and Contriever). One must keep in mind the cost associated with having additional chunks, especially in popular retrievers like DPR.
Yes, 💯
You should also mention that a value head is getting trained, which is placed on the LLM. You can refer TRL PPO example and check it's source code.
Thanks for sharing this. Awesome work ✌🏻
Thank you! 👏
🌟Watch out more Research Paper Summaries at czcams.com/video/ykClwtoLER8/video.html
can you code this paper with sample tickets data
Aah.. that’s looks very difficult with my current schedule :/ but I am sure someone must have or will be doing open source implementation of it 🔜
Awesome.... Thank you so much
You’re welcome!
Interesting.... Thank you for sharing....🤝
👏 👏
Thanks for the intuition ! :)
Can u share the source code
Hey, that’s not my app. Source link is there in the description.
Amazing paper. Thanks for making the video.
Thank you!
Thanks a lot... Very much needed.
👍👏
> Interested in consuming byte-sized AI/ML content. Then feel free to check www.youtube.com/@TechVizTheDataScienceGuy/shorts > If you're looking for more research paper summaries, then check czcams.com/play/PLsAqq9lZFOtWUz1WEoJ3GXw197LD7BxMc.html
Thanks for sharing the insights. I was exploring for similar usecase.
Great to hear that. Thank you!
thanks for this. you helped me solve a problem
Thanks for the clear explanation! Is there any self query retrieval cookbook available for reference?
Hi, thank you :) There’s an implementation in langchain, maybe you might want to check that out - python.langchain.com/v0.1/docs/modules/data_connection/retrievers/self_query/
Amazing 🙌
Thanks :)
⭐Interested in consuming byte-sized AI/ML content. Then feel free to check www.youtube.com/@TechVizTheDataScienceGuy/shorts ⭐If you're looking for research paper summaries, then check czcams.com/play/PLsAqq9lZFOtWUz1WEoJ3GXw197LD7BxMc.html
Which model is used in Re ranker? And how to get likelihood probability using that model. Any notebook for demo.
Hey, any LLM model that can give out next word probability is good here. The authors specifically tried with GPT versions like neo, J and then T5 variants. The likelihood of question is nothing but product of probability of every word given previously generated words. Here’s the official repo - github.com/DevSinghSachan/unsupervised-passage-reranking Hope this helps!
Crisp and clear! ❤
⭐ Interested in consuming byte-sized AI/ML content. Then feel free to check www.youtube.com/@TechVizTheDataScienceGuy/shorts ⭐Just like this one, if research paper summaries are your type, then check czcams.com/play/PLsAqq9lZFOtWUz1WEoJ3GXw197LD7BxMc.html
🌟Blog: prakhar-mishra.medium.com/enhancing-passage-retrieval-with-zero-shot-question-generation-paper-summary-301d34e0278b 🌟Interested in consuming byte-sized AI/ML content. Then feel free to check www.youtube.com/@TechVizTheDataScienceGuy/shorts 🌟Just like this one, if research paper summaries are your type, then check czcams.com/play/PLsAqq9lZFOtWUz1WEoJ3GXw197LD7BxMc.html
Nicely explained
This is not your typical BI or reporting engine where you are performing a dtilldown and drill through your indices. The index in your Vector db and LLM is good enough for your RAG
Was waiting for your explainer videos since last few months! Thanks for sharing 😊
Awesome man
Very clear explanation. Thank you so much!
Its very clear and crisp . Is it possible to add the practical implementation of RRF ??
Found this resource online - safjan.com/implementing-rank-fusion-in-python You can check this out.
Good approach but isn't it too expensive? Comparing each sentence with the next sentence in the whole document. It would work good for smaller dataset but will be slower otherwise.
It would be O(n) time complexity. You can add a few preprocessing steps to do it on smaller document portions, if that is a concern.
Are you familiar with the reasons for conducting re-ranking step? Specifically, given the premise of extracting relevant document candidates using only DPR, I'm curious about your perspective on why we'd need to conduct re-ranking using a cross-encoder, in addition to extracting relevant document candidates by computing cosine similarity with a bi-encoder.
Bi-encoder doesn’t really takes into account inter-attention calculation between sentences whereas cross attention does. Ideally one should use cross attention for first step as well, it’s just that it gets really expensive to do so. So get top k using fast method then re-arrange then with better method. Thanks
@@TechVizTheDataScienceGuy Thanks for your constructive feedback!
Please reply, Confused with so many approaches( BART,BERT,t5,pytorch...) in this domain, how would you develop a doable uni level Text Summarization project (MAJOR) at present time? Appreciate the work!
Hey, Didn’t follow the latter part. Can you please come again?