27
247 337

Retrieval augmented generation with OpenSearch and reranking

41:10

Named entity recognition (NER) model evaluation

41:27

Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)

11:42

Few-shot text classification with prompts

26:35

OpenAI function calling

38:14

Deploying machine learning models on Kubernetes

26:32

BentoML SageMaker deployment

In this video, we are going to discuss the basics of BentoML and then go through a hands-on example of taking a Scikit-learn model and deploying it on SageMaker with the help of BentoML.
The code + sketches from the video can be found here: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/bentoml
00:00 Intro
00:52 [diagram] Ideas behind BentoML
03:07 [diagram] Step by step procedure
03:21 [code] Creating a model
06:50 [code] Creating a bento - service.py
14:31 [code] Creating a bento - bentofile.yaml
16:53 [code] bentoctl init
19:34 [code] Inspecting terraform files
21:10 [code] Containerization + pushing to ECR
23:15 [code] Deployment via terraform
25:13 [code] Sending request and running inference
27:41 [code] Destroying resources
29:05 Outro

zhlédnutí: 1 235

Video

Retrieval augmented generation with OpenSearch and reranking

41:10

Retrieval augmented generation with OpenSearch and reranking

zhlédnutí 3,9KPřed 9 měsíci

In this video, we are going to be using OpenSearch and Cohere's Reranker endpoint to implement a minimal Retrieval augmented generation system that is able to perform question answering. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/rag-rerank/mini_tutorials/rag_with_reranking Cohere blogpost: txt.cohere.com/rerank/ 00:00 Intro 00:52 RAG with embeddings (semantic search) 03:16 ...

Named entity recognition (NER) model evaluation

41:27

Named entity recognition (NER) model evaluation

zhlédnutí 2,4KPřed rokem

In this video we are going to talk about different ways how one can evaluate an NER (named entity recognition) model. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/ner_evaluation github.com/chakki-works/seqeval 00:00 Intro 00:31 Mispredictions 02:31 IOB2 notation 04:03 Evaluation approaches 07:38 [code] HF evaluate seqeval 14:36 [code] Enitity-level fro...

Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)

11:42

Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)

zhlédnutí 2,5KPřed rokem

Today we are going to talk about how to use HTTPX to send requests asynchronously and also, we will talk about how to perform rate limiting. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/httpx_rate_limiting/ 00:00 Intro 01:15 [Code] Implement async requests WITHOUT rate limiting 07:20 [Code] Trying it out 08:48 [Code] Implement async requests WITH rate lim...

Few-shot text classification with prompts

26:35

Few-shot text classification with prompts

zhlédnutí 3,5KPřed rokem

In this video, I will talk about a possible way how to perform few-shot text classification using prompt engineering and the OpenAI API. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/fewshot_text_classification Inspiration for the video: github.com/explosion/prodigy-openai-recipes/tree/main Chat Completion API from OpenAI: platform.openai.com/docs/guides/g...

38:14

OpenAI function calling

zhlédnutí 2,9KPřed rokem

In this video we will go through the new feature "Function calling" of the OpenAI API (see more info here: openai.com/blog/function-calling-and-other-api-updates). First, I talk about the concepts and then I code up a small example where we implement a "financial analyst" bot. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/openai_function_calling/example.py...

Deploying machine learning models on Kubernetes

26:32

Deploying machine learning models on Kubernetes

zhlédnutí 16KPřed rokem

In this video, we will go through a simple end to end example how to deploy a ML model on Kubernetes. We will use an pretrained Transformer model on the task of masked language modelling (fill-mask) and turn it into a REST API. Then we will containerize our service and finally deploy it on a Kubernetes cluster. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials...

Haiku basics (neural network library from DeepMind)

39:06

Haiku basics (neural network library from DeepMind)

zhlédnutí 3,3KPřed rokem

In this video, we will go through basic concepts of Haiku which is a deep learning library created by DeepMind. Official repo: github.com/deepmind/dm-haiku Official docs: dm-haiku.readthedocs.io/en/latest/ Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/haiku_basics Chapters: 00:00 Intro 00:35 Cloning the repo setting things up 01:52 Parameters: hk.transform...

Product quantization in Faiss and from scratch

24:39

Product quantization in Faiss and from scratch

zhlédnutí 6KPřed 2 lety

In this video, we talk about a vector compression technique called Product quantization. We first explain conceptually, what the main ideas are and then show how one can use an existing implementation of it from Faiss (IndexPQ). Finally, we also implement the algorithm from scratch. Last but not least, we run some experiments and compare different methods. Paper: lear.inrialpes.fr/pubs/2011/JDS...

41:10

GPT in PyTorch

zhlédnutí 11KPřed 2 lety

In this video, we are going to implement the GPT2 model from scratch. We are only going to focus on the inference and not on the training logic. We will cover concepts like self attention, decoder blocks and generating new tokens. Paper: openai.com/blog/better-language-models/ Code minGPT: github.com/karpathy/minGPT Code transformers: github.com/huggingface/transformers/blob/0f69b924fbda6a442d7...

The Lottery Ticket Hypothesis and pruning in PyTorch

38:07

The Lottery Ticket Hypothesis and pruning in PyTorch

zhlédnutí 8KPřed 2 lety

In this video, we are going to explain how one can do pruning in PyTorch. We will then use this knowledge to implement a paper called "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". The paper states that feedforward neural networks have subnetworks (winning tickets) inside of them that perform as good as (or even better than) the original network. It also proposes a ...

The Sensory Neuron as a Transformer in PyTorch

50:34

The Sensory Neuron as a Transformer in PyTorch

zhlédnutí 3KPřed 2 lety

In this video, we implement a paper called "The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning" in PyTorch. It proposes a permutation invariant module called the Attention Neuron. Its goal is to independently process local information from the features and then combine the local knowledge into a global picture. Paper: arxiv.org/abs/2109.02869 O...

40:32

Integer embeddings in PyTorch

zhlédnutí 2,3KPřed 2 lety

In this video, we implement a paper called "Learning Mathematical Properties of Integers". Most notably, we use an LSTM network and an Encyclopedia of integer sequences to train custom integer embeddings. At the same time, we also extract integer sequences from already pretrained models - BERT and GloVe. We then compare how good these embeddings are at encoding mathematical properties of intege...

40:38

PonderNet in PyTorch

zhlédnutí 2,2KPřed 2 lety

In this video, we implement the PonderNet that was proposed in the paper "PonderNet: Learning to Ponder". It is a network that dynamically decides on the size of its forward pass. We are going to implement it and experiment with it a little bit on the so called ParityDataset. Note that the implementation is based on the labml.ai implementaiotn (see link below). I made some modification though s...

24:09

Mixup in PyTorch

zhlédnutí 3,3KPřed 2 lety

In this video, we implement the (input) mixup and manifold mixup. They are regularization techniques proposed in the papers "mixup: Beyond Empirical Risk Minimization" and "Manifold Mixup: Better Representations by Interpolating Hidden States". We investigate how these two schemes compare against more mainstream regularization methods like dropout and weight decay. Paper (Input mixup): arxiv.or...

44:34

DINO in PyTorch

zhlédnutí 13KPřed 3 lety

DINO in PyTorch

32:13

MLP-Mixer in Flax and PyTorch

zhlédnutí 4,3KPřed 3 lety

MLP-Mixer in Flax and PyTorch

Differentiable augmentation for GANs (using Kornia)

30:08

Differentiable augmentation for GANs (using Kornia)

zhlédnutí 2,6KPřed 3 lety

Differentiable augmentation for GANs (using Kornia)

Growing neural cellular automata in PyTorch

26:28

Growing neural cellular automata in PyTorch

zhlédnutí 4,6KPřed 3 lety

Growing neural cellular automata in PyTorch

29:22

SIREN in PyTorch

zhlédnutí 5KPřed 3 lety

SIREN in PyTorch

29:52

Vision Transformer in PyTorch

zhlédnutí 80KPřed 3 lety

Vision Transformer in PyTorch

torch.nn.Embedding explained (+ Character-level language model)

20:47

torch.nn.Embedding explained (+ Character-level language model)

zhlédnutí 34KPřed 3 lety

torch.nn.Embedding explained ( Character-level language model)

Gradient with respect to input in PyTorch (FGSM attack + Integrated Gradients)

19:59

Gradient with respect to input in PyTorch (FGSM attack + Integrated Gradients)

zhlédnutí 9KPřed 3 lety

Gradient with respect to input in PyTorch (FGSM attack Integrated Gradients)

NumPy equality testing: multiple ways to compare arrays

9:53

NumPy equality testing: multiple ways to compare arrays

zhlédnutí 1,8KPřed 3 lety

NumPy equality testing: multiple ways to compare arrays

18:00

Custom optimizer in PyTorch

zhlédnutí 6KPřed 3 lety

Custom optimizer in PyTorch

Mocking neural networks: unit testing in deep learning

16:01

Mocking neural networks: unit testing in deep learning

zhlédnutí 2,3KPřed 3 lety

Mocking neural networks: unit testing in deep learning

Visualizing activations with forward hooks (PyTorch)

18:58

Visualizing activations with forward hooks (PyTorch)

zhlédnutí 14KPřed 3 lety

Visualizing activations with forward hooks (PyTorch)

Komentáře

@SunilSamson-w2l Před 8 dny
the reason you got . , ? as the output for [MASK] because you didn't end your input request with a full stop. Bert Masking Models should be passed that way. "my name is [MASK]." should have been your request.
@JorgeGarcia-eg5ps Před 22 dny
Thank you for sharing this, I was actually looking for results of DINO on smaller compute/data so this is so helpful
@krishsharma4507 Před měsícem
its printing Original prediction: 293 how can I check the values or names of this predicted class
@Saevires Před měsícem
I am using custom tags, such as InvoiceNumber and GrossTotal. To work on entity level, does seqeval need tags in the format B- and I-?
@Huawei_Jiang Před měsícem
Hello authors, thank you for your video. It helped me a lot. However, I have one question about your code. In the original mixup, which is from the link you provided, the author mixed the loss function instead of mixing the label. But I noticed you mixed the label. Could you please explain the reason for this difference in operation? Looking forward to your reply
@shivendrasingh9759 Před 2 měsíci
Really helpful for foundation on ml ops
@mildlyoverfitted Před 2 měsíci
Glad to hear that!
@larrymckuydee5058 Před 2 měsíci
Is this method good if we want to search for list of products rather than chat-liked response?
@mildlyoverfitted Před 2 měsíci
Sure:) If you have text descriptions of the products then Elasticsearch/Opensearch + reranking is definitely a great option:)
@user-lu8fy4ku6e Před 2 měsíci
You are incredible man. -You go at a good pace. -Each project feels well planed. -Nice formating style. -Good explanation. Ive just started really digging into this machine learning space, any recommendation on learning on all the different layer types, and problem types?
@mildlyoverfitted Před 2 měsíci
Thanks a ton! ML has changed quite a lot over the past few years. I guess one architecture you should be familiar with nowadays is the transformer:) But I guess you have heard about it by now:D Good luck with your learning!
@mmacasual- Před 2 měsíci
Great example. Thanks for the information
@mildlyoverfitted Před 2 měsíci
My pleasure!
@lucianobatista6295 Před 2 měsíci
hi man, do you offer some training or mentorship?
@paolobarba1782 Před 2 měsíci
What to do if you want the encoding make by OpenSearch directly?
@akk2766 Před 2 měsíci
I concur with what everyone is saying - best video on function calling for sure. I really like the laid back nature of the tutorial - seriously simplifying function calling - even to the uninitiated! Only one suggestion: Please move inset video to top right so output can be seen in its entirety. Obviously not for this video, but for future awesome videos you produce.
@mildlyoverfitted Před 2 měsíci
Glad it was helpful! And thank you for the constructive feedback:)
@Munk-tt6tz Před 3 měsíci
This is the best video on this topic. Thank you!
@mildlyoverfitted Před 3 měsíci
Appreciate your comment!
@swk9015 Před 3 měsíci
what's the font you use?
@mildlyoverfitted Před 3 měsíci
Note sure. I am using this vim theme: github.com/morhetz/gruvbox so maybe you can find it somewhere in their repo.
@mmazher5826 Před 3 měsíci
is there any way of re SSL a pretrained DINO?
@danielasefa8087 Před 3 měsíci
Thank you so much for helping me to understand ViT!! Great work
@mildlyoverfitted Před 3 měsíci
Happy to help!
@PrafulKava Před 3 měsíci
Great video ! Good explanation. Thanks for all your efforts in making detailed video along with code !
@mildlyoverfitted Před 3 měsíci
You are welcome!
@leeuw6481 Před 3 měsíci
wow, this is dangerous xd
@mildlyoverfitted Před 3 měsíci
Hehe
@prajyotmane9067 Před 3 měsíci
Where did you include positional encoding ? or its not needed when using convolutions for patching and embedding ?
@neiro314 Před 3 měsíci
great video as a student, thank you so much! i will say a few lines didn't feel very well explained, however im sure to someone with a bit more knowledge than I it would be clearer but overall 10/10 tysm
@mildlyoverfitted Před 3 měsíci
Great point actually:) Appreciate your feedback:)
@user-td8vz8cn1h Před 4 měsíci
I'm a huge fan of implementing algorithms from scratch by myself and watched this video with a great pleasure. Thanks for your work, it deserves more attention.
@mildlyoverfitted Před 4 měsíci
Thank you for the message!
@danieltello8016 Před 4 měsíci
great video, can i run the code in a mac with M1 chip as it is?
@mildlyoverfitted Před 4 měsíci
Thanks! Yes, you can:)
@iamragulsurya Před 4 měsíci
Name of the font?
@mildlyoverfitted Před 4 měsíci
So the theme I am using is here: github.com/morhetz/gruvbox . The README talks about the fonts I believe.
@navins2246 Před 4 měsíci
Doing ML in vim is absolutely gigachad
@mildlyoverfitted Před 4 měsíci
Hahaha:D
@harrisnisar5345 Před 4 měsíci
Amazing video. Just curious, what keyboard are you using?
@mildlyoverfitted Před 4 měsíci
Glad you enjoyed it! Logitech MX Keys S
@jeffg4686 Před 5 měsíci
"mildly overfitted" is how I like to keep my underwear so I don't get the hyena.
@mildlyoverfitted Před 5 měsíci
Haha:) Made me laugh:D
@davidpratr Před 5 měsíci
really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?
@mildlyoverfitted Před 5 měsíci
Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.
@davidpratr Před 5 měsíci
@@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?
@mildlyoverfitted Před 5 měsíci
@@davidpratr interesting:) yes, it is tmux:)
@woutderijck5389 Před 5 měsíci
When starting out, would you recommend just using embedding and vectorsearch or should you also consider the hybrid case of opensearch & vectorsearch? In the video it looks like you should go all in on vectorsearch
@mildlyoverfitted Před 5 měsíci
I would recommend just doing Opensearch + reranking. No embeddings (=vector search). Assuming you wanna have something minimal really quickly as demonstrated in the video:)
@Ldmp807 Před 5 měsíci
isn't this concurrency limit not rate limit? i.e limit per second
@mildlyoverfitted Před 5 měsíci
I think you are right:) The video title is definitely misleading. Sorry about that!
@vidinvijay Před 5 měsíci
novelty explained in just over 6 minutes. 🙇
@mildlyoverfitted Před 5 měsíci
Hope you liked it:)
@kascesar Před 5 měsíci
hi, im getting this error: ""'sagemaker_service:svc' is not found in BentoML store <osfs '/home/bentoml/bentos'>, you may need to run `bentoml models pull` first'."" any idea ? Thnks a lot
@mildlyoverfitted Před 5 měsíci
Hmmm, if the problem still persists you can create an issue here: github.com/jankrepl/mildlyoverfitted/issues Describing exactly what you did and I can try to help!
@kascesar Před 5 měsíci
@@mildlyoverfitted solved, i did It. The problem come with bentoml versión, i had install bentoml==1.1.11 this solve the problema for me
@yuricastro522 Před 5 měsíci
Thank you so much, your example helped me to solve some problems :)
@mildlyoverfitted Před 5 měsíci
Happy to help!
@macx7760 Před 6 měsíci
why is the shape of the mlp input at 2nd dim n_patches +1, isnt the mlp just applied to the class token?
@mildlyoverfitted Před 5 měsíci
So the `MLP` module is used inside of the Transformer block and and it inputs a 3D tensor. See this link for the only place where the CLS is explicitly extracted github.com/jankrepl/mildlyoverfitted/blob/22f0ecc67cef14267ee91ff2e4df6bf9f6d65bc2/github_adventures/vision_transformer/custom.py#L423-L424 Hope that helps:)
@macx7760 Před 5 měsíci
thanks, yeah confused the mlp inside the block with the mlp at the end for classification@@mildlyoverfitted
@macx7760 Před 6 měsíci
fantastic video, just a quick note: at 16:01 you say that "none of the operations are changing the shape of the tensor", but isnt this wrong, since when applying fc2, the last dim should be out_features, not hidden_features, so the shapes are also wrongly commented.
@mildlyoverfitted Před 5 měsíci
Nice find and sorry for the mistake:)! Somebody already pointed it out a while ago:) Look at the pinned errata comment:)
@macx7760 Před 5 měsíci
ah i see, my bad :D @@mildlyoverfitted
@TwenTV Před 6 měsíci
Which frameworks would you recommend if you had to scale to +1000 models? I am looking at custom FastAPI and MLFlow with AWS Lambda, but where each inference request will load the model from object storage and call .predict. The models are generally lightweight and predictions would only have to be made on an hourly basis, so I don't think its necessary to serve them in memory.
@mildlyoverfitted Před 5 měsíci
If you are not experiencing a cold start (or you don't care) then Lambda is definitely a great solution:)
@noedie4973 Před 6 měsíci
Thanks for the nice video explanation! Could you please tell me what modifications I can make to get the output in a certain format? Say I want it to output only the label value with no other text?
@mildlyoverfitted Před 6 měsíci
Thank you! The current template should lead to you only getting the label. However, feel free to prompt engineer it if you are not getting the expected result. You can also request it to give you a valid JSON which you can then easily parse:) Just an idea. Hope that helps:)
@noedie4973 Před 6 měsíci
@@mildlyoverfitted thanks, it really helped me a lot. I achieved perfect results by restricting my response token limit. So it focusses on outputting the digit label (in flexible forms), from which i can extract it using simple regex. The JSON method seems v clean too.
@idoronen9497 Před 6 měsíci
Thank you for the video! I have a question: If I need to make updates to an existing service, do I have to go through the entire process again, or is there a more efficient way? Bentoctl build seems quite time-consuming. Appreciate your help!"
@mildlyoverfitted Před 6 měsíci
Appreciate your comment! If the change is inside of your ML model or the serving logic (service.py) you will have to rebuild the image. However, the second time around some layers should be cached (docs.docker.com/build/guide/layers/ ) so in theory it should be faster (it depends though). Another thing you can do is to build the image in some virtual machine rather than locally. A common setup is that you build it + upload to ECR in your CI (e.g. GitHub actions) Just some ideas:)
@Lithdren Před 6 měsíci
Is there a method you can use to rate limit by time? Im interacting with an API that limits me to no more than 20 requests a minute, and i've been struggling with a way to handle that. Right now I keep track of the time of the last call, and if I made a request within the last 3 seconds I wait till 3 seconds, then send out the next request. I have multiple API keys I can utilize, and each key has a set limit, so I cycle through them, but it feels like there must be a faster way.
@mildlyoverfitted Před 6 měsíci
One alternative solution is to use some open source package (e.g. github.com/florimondmanca/aiometer ). I don't really know much about it but maybe it can help:)
@gunabalang9543 Před 6 měsíci
what keyboard are you using?
@mildlyoverfitted Před 6 měsíci
Logitech MX Keys :)
@aditya_01 Před 6 měsíci
great video thanks a lot really liked the explanation !!!.
@mildlyoverfitted Před 6 měsíci
Glad it was helpful!
@nandakishorejoshi3487 Před 7 měsíci
Great video. How to run a text generation model? I tried running a GPT2 model with the below code Creating API : transformers-cli serve --task=text-generation --model=gpt2 Calling API: curl -X POST localhost:8888/forward -H "accept: application/json" -H "Content-Type: application/json" -d '{"inputs":"What is Deep Learning","parameters":{"max_new_tokens":20}}' But getting error in the response {"detail":[{"type":"json_invalid","loc":["body",0],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}
@theAhmd Před 7 měsíci
terminal and theme name please
@mildlyoverfitted Před 7 měsíci
tmux + gruvbox
@kyrylogorbachov3779 Před 7 měsíci
Thanks a lot for a content!
@mildlyoverfitted Před 7 měsíci
You are welcome!
@thinkman2137 Před 8 měsíci
Thank you for detail tutorial!
@thinkman2137 Před 8 měsíci
But torchserve now has kubernetes intergration
@mildlyoverfitted Před 7 měsíci
I will definitely look into it:) Thank you for pointing it out!!
@mkamp Před 8 měsíci
Using VIM, Tmux and an audible keyboard never gets old!
@mildlyoverfitted Před 7 měsíci
hehe, agree:)
@diegosabajo2182 Před 8 měsíci
Thanks for the video man. there aren't many resources on bentoml so I appreciate your contribution. can you please at more in the future.
@mildlyoverfitted Před 8 měsíci
Appreciate your message:) Thank you! I will very likely do more BentoML related stuff in the future:)
@faizasetif1103 Před 8 měsíci
is this code for classification images !!
@mildlyoverfitted Před 8 měsíci
Not sure what you mean, but DINO is a self-supervised algorithm:) Not a supervised one (e.g. classification)
@faizasetif1103 Před 8 měsíci
@@mildlyoverfitted i want use dino for classification task how!!
@AM-yk5yd Před 8 měsíci
Hi, if you accept suggestions, can you look up into implementing something from H3, S4, S5, etc? Structured State Spaces occupy at least half of top10 architectures on LRA and there are about zero intuitive explanations of them.
@mildlyoverfitted Před 8 měsíci
Hey there! Actually, I never heard of those! I am adding it to my reading list:) Cannot promise I will make a video about them though:) Thank you!
@user-cp1pe2tx7h Před 8 měsíci
Great!
@rokieplayer7729 Před 8 měsíci
let me know the paper name, please~
@mildlyoverfitted Před 8 měsíci
arxiv.org/abs/2010.11929