mildlyoverfitted
mildlyoverfitted
  • 27
  • 247 337
BentoML SageMaker deployment
In this video, we are going to discuss the basics of BentoML and then go through a hands-on example of taking a Scikit-learn model and deploying it on SageMaker with the help of BentoML.
The code + sketches from the video can be found here: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/bentoml
00:00 Intro
00:52 [diagram] Ideas behind BentoML
03:07 [diagram] Step by step procedure
03:21 [code] Creating a model
06:50 [code] Creating a bento - service.py
14:31 [code] Creating a bento - bentofile.yaml
16:53 [code] bentoctl init
19:34 [code] Inspecting terraform files
21:10 [code] Containerization + pushing to ECR
23:15 [code] Deployment via terraform
25:13 [code] Sending request and running inference
27:41 [code] Destroying resources
29:05 Outro
zhlédnutí: 1 235

Video

Retrieval augmented generation with OpenSearch and reranking
zhlédnutí 3,9KPřed 9 měsíci
In this video, we are going to be using OpenSearch and Cohere's Reranker endpoint to implement a minimal Retrieval augmented generation system that is able to perform question answering. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/rag-rerank/mini_tutorials/rag_with_reranking Cohere blogpost: txt.cohere.com/rerank/ 00:00 Intro 00:52 RAG with embeddings (semantic search) 03:16 ...
Named entity recognition (NER) model evaluation
zhlédnutí 2,4KPřed rokem
In this video we are going to talk about different ways how one can evaluate an NER (named entity recognition) model. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/github_adventures/ner_evaluation github.com/chakki-works/seqeval 00:00 Intro 00:31 Mispredictions 02:31 IOB2 notation 04:03 Evaluation approaches 07:38 [code] HF evaluate seqeval 14:36 [code] Enitity-level fro...
Asynchronous requests and rate limiting (HTTPX and asyncio.Semaphore)
zhlédnutí 2,5KPřed rokem
Today we are going to talk about how to use HTTPX to send requests asynchronously and also, we will talk about how to perform rate limiting. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/httpx_rate_limiting/ 00:00 Intro 01:15 [Code] Implement async requests WITHOUT rate limiting 07:20 [Code] Trying it out 08:48 [Code] Implement async requests WITH rate lim...
Few-shot text classification with prompts
zhlédnutí 3,5KPřed rokem
In this video, I will talk about a possible way how to perform few-shot text classification using prompt engineering and the OpenAI API. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/fewshot_text_classification Inspiration for the video: github.com/explosion/prodigy-openai-recipes/tree/main Chat Completion API from OpenAI: platform.openai.com/docs/guides/g...
OpenAI function calling
zhlédnutí 2,9KPřed rokem
In this video we will go through the new feature "Function calling" of the OpenAI API (see more info here: openai.com/blog/function-calling-and-other-api-updates). First, I talk about the concepts and then I code up a small example where we implement a "financial analyst" bot. Code from the video: github.com/jankrepl/mildlyoverfitted/blob/master/mini_tutorials/openai_function_calling/example.py...
Deploying machine learning models on Kubernetes
zhlédnutí 16KPřed rokem
In this video, we will go through a simple end to end example how to deploy a ML model on Kubernetes. We will use an pretrained Transformer model on the task of masked language modelling (fill-mask) and turn it into a REST API. Then we will containerize our service and finally deploy it on a Kubernetes cluster. Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials...
Haiku basics (neural network library from DeepMind)
zhlédnutí 3,3KPřed rokem
In this video, we will go through basic concepts of Haiku which is a deep learning library created by DeepMind. Official repo: github.com/deepmind/dm-haiku Official docs: dm-haiku.readthedocs.io/en/latest/ Code from the video: github.com/jankrepl/mildlyoverfitted/tree/master/mini_tutorials/haiku_basics Chapters: 00:00 Intro 00:35 Cloning the repo setting things up 01:52 Parameters: hk.transform...
Product quantization in Faiss and from scratch
zhlédnutí 6KPřed 2 lety
In this video, we talk about a vector compression technique called Product quantization. We first explain conceptually, what the main ideas are and then show how one can use an existing implementation of it from Faiss (IndexPQ). Finally, we also implement the algorithm from scratch. Last but not least, we run some experiments and compare different methods. Paper: lear.inrialpes.fr/pubs/2011/JDS...
GPT in PyTorch
zhlédnutí 11KPřed 2 lety
In this video, we are going to implement the GPT2 model from scratch. We are only going to focus on the inference and not on the training logic. We will cover concepts like self attention, decoder blocks and generating new tokens. Paper: openai.com/blog/better-language-models/ Code minGPT: github.com/karpathy/minGPT Code transformers: github.com/huggingface/transformers/blob/0f69b924fbda6a442d7...
The Lottery Ticket Hypothesis and pruning in PyTorch
zhlédnutí 8KPřed 2 lety
In this video, we are going to explain how one can do pruning in PyTorch. We will then use this knowledge to implement a paper called "The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks". The paper states that feedforward neural networks have subnetworks (winning tickets) inside of them that perform as good as (or even better than) the original network. It also proposes a ...
The Sensory Neuron as a Transformer in PyTorch
zhlédnutí 3KPřed 2 lety
In this video, we implement a paper called "The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning" in PyTorch. It proposes a permutation invariant module called the Attention Neuron. Its goal is to independently process local information from the features and then combine the local knowledge into a global picture. Paper: arxiv.org/abs/2109.02869 O...
Integer embeddings in PyTorch
zhlédnutí 2,3KPřed 2 lety
In this video, we implement a paper called "Learning Mathematical Properties of Integers". Most notably, we use an LSTM network and an Encyclopedia of integer sequences to train custom integer embeddings. At the same time, we also extract integer sequences from already pretrained models - BERT and GloVe. We then compare how good these embeddings are at encoding mathematical properties of intege...
PonderNet in PyTorch
zhlédnutí 2,2KPřed 2 lety
In this video, we implement the PonderNet that was proposed in the paper "PonderNet: Learning to Ponder". It is a network that dynamically decides on the size of its forward pass. We are going to implement it and experiment with it a little bit on the so called ParityDataset. Note that the implementation is based on the labml.ai implementaiotn (see link below). I made some modification though s...
Mixup in PyTorch
zhlédnutí 3,3KPřed 2 lety
In this video, we implement the (input) mixup and manifold mixup. They are regularization techniques proposed in the papers "mixup: Beyond Empirical Risk Minimization" and "Manifold Mixup: Better Representations by Interpolating Hidden States". We investigate how these two schemes compare against more mainstream regularization methods like dropout and weight decay. Paper (Input mixup): arxiv.or...
DINO in PyTorch
zhlédnutí 13KPřed 3 lety
DINO in PyTorch
MLP-Mixer in Flax and PyTorch
zhlédnutí 4,3KPřed 3 lety
MLP-Mixer in Flax and PyTorch
Differentiable augmentation for GANs (using Kornia)
zhlédnutí 2,6KPřed 3 lety
Differentiable augmentation for GANs (using Kornia)
Growing neural cellular automata in PyTorch
zhlédnutí 4,6KPřed 3 lety
Growing neural cellular automata in PyTorch
SIREN in PyTorch
zhlédnutí 5KPřed 3 lety
SIREN in PyTorch
Vision Transformer in PyTorch
zhlédnutí 80KPřed 3 lety
Vision Transformer in PyTorch
torch.nn.Embedding explained (+ Character-level language model)
zhlédnutí 34KPřed 3 lety
torch.nn.Embedding explained ( Character-level language model)
Gradient with respect to input in PyTorch (FGSM attack + Integrated Gradients)
zhlédnutí 9KPřed 3 lety
Gradient with respect to input in PyTorch (FGSM attack Integrated Gradients)
NumPy equality testing: multiple ways to compare arrays
zhlédnutí 1,8KPřed 3 lety
NumPy equality testing: multiple ways to compare arrays
Custom optimizer in PyTorch
zhlédnutí 6KPřed 3 lety
Custom optimizer in PyTorch
Mocking neural networks: unit testing in deep learning
zhlédnutí 2,3KPřed 3 lety
Mocking neural networks: unit testing in deep learning
Visualizing activations with forward hooks (PyTorch)
zhlédnutí 14KPřed 3 lety
Visualizing activations with forward hooks (PyTorch)

Komentáře

  • @SunilSamson-w2l
    @SunilSamson-w2l Před 8 dny

    the reason you got . , ? as the output for [MASK] because you didn't end your input request with a full stop. Bert Masking Models should be passed that way. "my name is [MASK]." should have been your request.

  • @JorgeGarcia-eg5ps
    @JorgeGarcia-eg5ps Před 22 dny

    Thank you for sharing this, I was actually looking for results of DINO on smaller compute/data so this is so helpful

  • @krishsharma4507
    @krishsharma4507 Před měsícem

    its printing Original prediction: 293 how can I check the values or names of this predicted class

  • @Saevires
    @Saevires Před měsícem

    I am using custom tags, such as InvoiceNumber and GrossTotal. To work on entity level, does seqeval need tags in the format B- and I-?

  • @Huawei_Jiang
    @Huawei_Jiang Před měsícem

    Hello authors, thank you for your video. It helped me a lot. However, I have one question about your code. In the original mixup, which is from the link you provided, the author mixed the loss function instead of mixing the label. But I noticed you mixed the label. Could you please explain the reason for this difference in operation? Looking forward to your reply

  • @shivendrasingh9759
    @shivendrasingh9759 Před 2 měsíci

    Really helpful for foundation on ml ops

  • @larrymckuydee5058
    @larrymckuydee5058 Před 2 měsíci

    Is this method good if we want to search for list of products rather than chat-liked response?

    • @mildlyoverfitted
      @mildlyoverfitted Před 2 měsíci

      Sure:) If you have text descriptions of the products then Elasticsearch/Opensearch + reranking is definitely a great option:)

  • @user-lu8fy4ku6e
    @user-lu8fy4ku6e Před 2 měsíci

    You are incredible man. -You go at a good pace. -Each project feels well planed. -Nice formating style. -Good explanation. Ive just started really digging into this machine learning space, any recommendation on learning on all the different layer types, and problem types?

    • @mildlyoverfitted
      @mildlyoverfitted Před 2 měsíci

      Thanks a ton! ML has changed quite a lot over the past few years. I guess one architecture you should be familiar with nowadays is the transformer:) But I guess you have heard about it by now:D Good luck with your learning!

  • @mmacasual-
    @mmacasual- Před 2 měsíci

    Great example. Thanks for the information

  • @lucianobatista6295
    @lucianobatista6295 Před 2 měsíci

    hi man, do you offer some training or mentorship?

  • @paolobarba1782
    @paolobarba1782 Před 2 měsíci

    What to do if you want the encoding make by OpenSearch directly?

  • @akk2766
    @akk2766 Před 2 měsíci

    I concur with what everyone is saying - best video on function calling for sure. I really like the laid back nature of the tutorial - seriously simplifying function calling - even to the uninitiated! Only one suggestion: Please move inset video to top right so output can be seen in its entirety. Obviously not for this video, but for future awesome videos you produce.

    • @mildlyoverfitted
      @mildlyoverfitted Před 2 měsíci

      Glad it was helpful! And thank you for the constructive feedback:)

  • @Munk-tt6tz
    @Munk-tt6tz Před 3 měsíci

    This is the best video on this topic. Thank you!

  • @swk9015
    @swk9015 Před 3 měsíci

    what's the font you use?

    • @mildlyoverfitted
      @mildlyoverfitted Před 3 měsíci

      Note sure. I am using this vim theme: github.com/morhetz/gruvbox so maybe you can find it somewhere in their repo.

  • @mmazher5826
    @mmazher5826 Před 3 měsíci

    is there any way of re SSL a pretrained DINO?

  • @danielasefa8087
    @danielasefa8087 Před 3 měsíci

    Thank you so much for helping me to understand ViT!! Great work

  • @PrafulKava
    @PrafulKava Před 3 měsíci

    Great video ! Good explanation. Thanks for all your efforts in making detailed video along with code !

  • @leeuw6481
    @leeuw6481 Před 3 měsíci

    wow, this is dangerous xd

  • @prajyotmane9067
    @prajyotmane9067 Před 3 měsíci

    Where did you include positional encoding ? or its not needed when using convolutions for patching and embedding ?

  • @neiro314
    @neiro314 Před 3 měsíci

    great video as a student, thank you so much! i will say a few lines didn't feel very well explained, however im sure to someone with a bit more knowledge than I it would be clearer but overall 10/10 tysm

  • @user-td8vz8cn1h
    @user-td8vz8cn1h Před 4 měsíci

    I'm a huge fan of implementing algorithms from scratch by myself and watched this video with a great pleasure. Thanks for your work, it deserves more attention.

  • @danieltello8016
    @danieltello8016 Před 4 měsíci

    great video, can i run the code in a mac with M1 chip as it is?

  • @iamragulsurya
    @iamragulsurya Před 4 měsíci

    Name of the font?

    • @mildlyoverfitted
      @mildlyoverfitted Před 4 měsíci

      So the theme I am using is here: github.com/morhetz/gruvbox . The README talks about the fonts I believe.

  • @navins2246
    @navins2246 Před 4 měsíci

    Doing ML in vim is absolutely gigachad

  • @harrisnisar5345
    @harrisnisar5345 Před 4 měsíci

    Amazing video. Just curious, what keyboard are you using?

  • @jeffg4686
    @jeffg4686 Před 5 měsíci

    "mildly overfitted" is how I like to keep my underwear so I don't get the hyena.

  • @davidpratr
    @davidpratr Před 5 měsíci

    really nice video. Would you see any benefit of using the deployment in a single node with M1 chip? I'd say somehow yes because an inference might not be taking all the CPU of the M1 chip, but how about scaling the model in terms of RAM? one of those models might take 4-7GB of RAM which makes up to 21GB of RAM only for 3 pods. What's you opinion on that?

    • @mildlyoverfitted
      @mildlyoverfitted Před 5 měsíci

      Glad you liked the video! Honestly, I filmed the video on my M1 using minikube mostly because of convenience. But on real projects I have always worked with K8s clusters that had multiple nodes. So I cannot really advocate for the single node setup other than for learning purposes.

    • @davidpratr
      @davidpratr Před 5 měsíci

      @@mildlyoverfittedgot it. So, very likely more petitions could be resolved at the same time but with a very limited scalability and probably with performance loss. By the way, what are those fancy combos with the terminal? is it tmux?

    • @mildlyoverfitted
      @mildlyoverfitted Před 5 měsíci

      @@davidpratr interesting:) yes, it is tmux:)

  • @woutderijck5389
    @woutderijck5389 Před 5 měsíci

    When starting out, would you recommend just using embedding and vectorsearch or should you also consider the hybrid case of opensearch & vectorsearch? In the video it looks like you should go all in on vectorsearch

    • @mildlyoverfitted
      @mildlyoverfitted Před 5 měsíci

      I would recommend just doing Opensearch + reranking. No embeddings (=vector search). Assuming you wanna have something minimal really quickly as demonstrated in the video:)

  • @Ldmp807
    @Ldmp807 Před 5 měsíci

    isn't this concurrency limit not rate limit? i.e limit per second

    • @mildlyoverfitted
      @mildlyoverfitted Před 5 měsíci

      I think you are right:) The video title is definitely misleading. Sorry about that!

  • @vidinvijay
    @vidinvijay Před 5 měsíci

    novelty explained in just over 6 minutes. 🙇

  • @kascesar
    @kascesar Před 5 měsíci

    hi, im getting this error: ""'sagemaker_service:svc' is not found in BentoML store <osfs '/home/bentoml/bentos'>, you may need to run `bentoml models pull` first'."" any idea ? Thnks a lot

    • @mildlyoverfitted
      @mildlyoverfitted Před 5 měsíci

      Hmmm, if the problem still persists you can create an issue here: github.com/jankrepl/mildlyoverfitted/issues Describing exactly what you did and I can try to help!

    • @kascesar
      @kascesar Před 5 měsíci

      @@mildlyoverfitted solved, i did It. The problem come with bentoml versión, i had install bentoml==1.1.11 this solve the problema for me

  • @yuricastro522
    @yuricastro522 Před 5 měsíci

    Thank you so much, your example helped me to solve some problems :)

  • @macx7760
    @macx7760 Před 6 měsíci

    why is the shape of the mlp input at 2nd dim n_patches +1, isnt the mlp just applied to the class token?

    • @mildlyoverfitted
      @mildlyoverfitted Před 5 měsíci

      So the `MLP` module is used inside of the Transformer block and and it inputs a 3D tensor. See this link for the only place where the CLS is explicitly extracted github.com/jankrepl/mildlyoverfitted/blob/22f0ecc67cef14267ee91ff2e4df6bf9f6d65bc2/github_adventures/vision_transformer/custom.py#L423-L424 Hope that helps:)

    • @macx7760
      @macx7760 Před 5 měsíci

      thanks, yeah confused the mlp inside the block with the mlp at the end for classification@@mildlyoverfitted

  • @macx7760
    @macx7760 Před 6 měsíci

    fantastic video, just a quick note: at 16:01 you say that "none of the operations are changing the shape of the tensor", but isnt this wrong, since when applying fc2, the last dim should be out_features, not hidden_features, so the shapes are also wrongly commented.

    • @mildlyoverfitted
      @mildlyoverfitted Před 5 měsíci

      Nice find and sorry for the mistake:)! Somebody already pointed it out a while ago:) Look at the pinned errata comment:)

    • @macx7760
      @macx7760 Před 5 měsíci

      ah i see, my bad :D @@mildlyoverfitted

  • @TwenTV
    @TwenTV Před 6 měsíci

    Which frameworks would you recommend if you had to scale to +1000 models? I am looking at custom FastAPI and MLFlow with AWS Lambda, but where each inference request will load the model from object storage and call .predict. The models are generally lightweight and predictions would only have to be made on an hourly basis, so I don't think its necessary to serve them in memory.

    • @mildlyoverfitted
      @mildlyoverfitted Před 5 měsíci

      If you are not experiencing a cold start (or you don't care) then Lambda is definitely a great solution:)

  • @noedie4973
    @noedie4973 Před 6 měsíci

    Thanks for the nice video explanation! Could you please tell me what modifications I can make to get the output in a certain format? Say I want it to output only the label value with no other text?

    • @mildlyoverfitted
      @mildlyoverfitted Před 6 měsíci

      Thank you! The current template should lead to you only getting the label. However, feel free to prompt engineer it if you are not getting the expected result. You can also request it to give you a valid JSON which you can then easily parse:) Just an idea. Hope that helps:)

    • @noedie4973
      @noedie4973 Před 6 měsíci

      @@mildlyoverfitted thanks, it really helped me a lot. I achieved perfect results by restricting my response token limit. So it focusses on outputting the digit label (in flexible forms), from which i can extract it using simple regex. The JSON method seems v clean too.

  • @idoronen9497
    @idoronen9497 Před 6 měsíci

    Thank you for the video! I have a question: If I need to make updates to an existing service, do I have to go through the entire process again, or is there a more efficient way? Bentoctl build seems quite time-consuming. Appreciate your help!"

    • @mildlyoverfitted
      @mildlyoverfitted Před 6 měsíci

      Appreciate your comment! If the change is inside of your ML model or the serving logic (service.py) you will have to rebuild the image. However, the second time around some layers should be cached (docs.docker.com/build/guide/layers/ ) so in theory it should be faster (it depends though). Another thing you can do is to build the image in some virtual machine rather than locally. A common setup is that you build it + upload to ECR in your CI (e.g. GitHub actions) Just some ideas:)

  • @Lithdren
    @Lithdren Před 6 měsíci

    Is there a method you can use to rate limit by time? Im interacting with an API that limits me to no more than 20 requests a minute, and i've been struggling with a way to handle that. Right now I keep track of the time of the last call, and if I made a request within the last 3 seconds I wait till 3 seconds, then send out the next request. I have multiple API keys I can utilize, and each key has a set limit, so I cycle through them, but it feels like there must be a faster way.

    • @mildlyoverfitted
      @mildlyoverfitted Před 6 měsíci

      One alternative solution is to use some open source package (e.g. github.com/florimondmanca/aiometer ). I don't really know much about it but maybe it can help:)

  • @gunabalang9543
    @gunabalang9543 Před 6 měsíci

    what keyboard are you using?

  • @aditya_01
    @aditya_01 Před 6 měsíci

    great video thanks a lot really liked the explanation !!!.

  • @nandakishorejoshi3487
    @nandakishorejoshi3487 Před 7 měsíci

    Great video. How to run a text generation model? I tried running a GPT2 model with the below code Creating API : transformers-cli serve --task=text-generation --model=gpt2 Calling API: curl -X POST localhost:8888/forward -H "accept: application/json" -H "Content-Type: application/json" -d '{"inputs":"What is Deep Learning","parameters":{"max_new_tokens":20}}' But getting error in the response {"detail":[{"type":"json_invalid","loc":["body",0],"msg":"JSON decode error","input":{},"ctx":{"error":"Expecting value"}}]}

  • @theAhmd
    @theAhmd Před 7 měsíci

    terminal and theme name please

  • @kyrylogorbachov3779
    @kyrylogorbachov3779 Před 7 měsíci

    Thanks a lot for a content!

  • @thinkman2137
    @thinkman2137 Před 8 měsíci

    Thank you for detail tutorial!

    • @thinkman2137
      @thinkman2137 Před 8 měsíci

      But torchserve now has kubernetes intergration

    • @mildlyoverfitted
      @mildlyoverfitted Před 7 měsíci

      I will definitely look into it:) Thank you for pointing it out!!

  • @mkamp
    @mkamp Před 8 měsíci

    Using VIM, Tmux and an audible keyboard never gets old!

  • @diegosabajo2182
    @diegosabajo2182 Před 8 měsíci

    Thanks for the video man. there aren't many resources on bentoml so I appreciate your contribution. can you please at more in the future.

    • @mildlyoverfitted
      @mildlyoverfitted Před 8 měsíci

      Appreciate your message:) Thank you! I will very likely do more BentoML related stuff in the future:)

  • @faizasetif1103
    @faizasetif1103 Před 8 měsíci

    is this code for classification images !!

    • @mildlyoverfitted
      @mildlyoverfitted Před 8 měsíci

      Not sure what you mean, but DINO is a self-supervised algorithm:) Not a supervised one (e.g. classification)

    • @faizasetif1103
      @faizasetif1103 Před 8 měsíci

      @@mildlyoverfitted i want use dino for classification task how!!

  • @AM-yk5yd
    @AM-yk5yd Před 8 měsíci

    Hi, if you accept suggestions, can you look up into implementing something from H3, S4, S5, etc? Structured State Spaces occupy at least half of top10 architectures on LRA and there are about zero intuitive explanations of them.

    • @mildlyoverfitted
      @mildlyoverfitted Před 8 měsíci

      Hey there! Actually, I never heard of those! I am adding it to my reading list:) Cannot promise I will make a video about them though:) Thank you!

  • @user-cp1pe2tx7h
    @user-cp1pe2tx7h Před 8 měsíci

    Great!

  • @rokieplayer7729
    @rokieplayer7729 Před 8 měsíci

    let me know the paper name, please~