How To Auto-Scale Kubernetes Clusters With Karpenter

DevOps Toolkit

zhlédnutí 23 597

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 29. 08. 2024
Věda a technologie

Komentáře • 98

@DevOpsToolkit Před 2 lety ⁺³
Would you switch from Kubernetes Cluster-Autoscaler to Karpenter (if you can)?
IMPORTANT: For reasons I do not comprehend (and Google support could not figure out), CZcams tends to delete comments that contain links. Please do not use them in your comments.
@marymekins3546 Před 2 lety ⁺²
Seems, like a severe vendor lock-in with using Karpenter. Can it be used with other cloud providers, like for example, hetzner cloud? How does it compare to keda.sh and knative autoscaling? thank you sharing
@DevOpsToolkit Před 2 lety ⁺¹
@@marymekins3546 @Mary Mekins it's not really about vendor locking. It is open source and the major question is whether other providers will extend it or not. So, today it's only for EKS and tomorrow... we do not yet know.
Keda, KNative, and similar are about horizontal scaling of applications. Karpenter is about scaling clusters/nodes. Those are very different goals even though app scaling often results in cluster scaling.
@marymekins3546 Před 2 lety ⁺¹
@@DevOpsToolkit Thanks for the clarification. Also, Do you consider Crossplane and Gardener's autoscaling components more relevant for node / cluster autoscaling? Thank you
@DevOpsToolkit Před 2 lety ⁺¹
@@marymekins3546 Neither of those (Crossplane and Gardener) has its own Cluster Autoscaler. so they rely on those that are baked into managed Kubernetes offerings (e.g,. GKE, AKS, etc.) or can apply cluster scalers (e.g., Kubernetes Cluster Autoscaler, Karpenters, etc.).
What I'm trying to say is that Crossplane and, to an extent, Gardener, are orchestrating infra services rather than providing specific implementations of those services, including cluster scalers.
@stormrage8872 Před 2 lety ⁺⁴
I saw Karpenter i think 30 mins after it went GA. The issue why other providers will probably not contribute is because of the binning technique they used which is tied (and a horrible limitation) of AWS.
Then I wanted to use it, can't utilize my launch templates because of custom pods per node, can't utilize crossplane because nodegroups (or this also might fall down to not knowing how) so for now it will be a no go, but a project that steps in the right direction nonetheless. Thanks for the video
@j0Nt4Mbi Před 2 lety ⁺³
awesome explanation Viktor thanks again for a such a valuable information.
@umeshranasinghe Před 2 měsíci ⁺¹
Great video. Thank you very much!
@envueltoenplastico Před 2 lety ⁺²
More good news! This looks great. Cluster Autoscaler was wrecking my head. Trying Karpenter out now. Thanks for the video :)
Also, I'm using the latest build (0.80-dev) of eksctl, which allows you to define a `karpenter` configuration value to `ClusterConfig`, so hopefully that takes most of the legwork out of the process - I believe all that's necessary after that is to create `Provisioner` resources as required.
@DevOpsToolkit Před 2 lety ⁺¹
That's not the surprise. WeaveWorks is the company that is heavily involved in all AWS k8s OSS projects so it was to be expected that they'll extend eksctl (they did the most contributions to it).
@user-wu4pt3kw4t Před 4 měsíci ⁺¹
You're fricking funny 👍. Thank you so much for such a great demo. To the point.
@agun21st Před rokem ⁺¹
Very details explainations about Karpenter. Thank you so much sir.
@DevOpsToolkit Před rokem ⁺¹
You are most welcome
@felipeozoski Před rokem ⁺¹
Thanks for another awesome video Viktor :)
@kavilofi Před 5 měsíci ⁺¹
Awesome explanation....🤩
@kavilofi Před 5 měsíci ⁺¹
sir please explanation about k8s keda
@DevOpsToolkit Před 5 měsíci
@devopsguy- here it goes... KEDA: Kubernetes Event-Driven Autoscaling
czcams.com/video/3lcaawKAv6s/video.html
@DaniVendettaXII Před 2 lety ⁺³
I'm trying Karpenter, and I found a cons, I'm investigating but... When the workload decrease the nodes are not changed. I mean, for example you scale to 10 replicas, and Karpenter decide to provision 1 c5n.2xlarge instance. Some time later you scaledown your pods from 10 to 6, and your instance can change to t3.medium, (for instance), I've observated that Karpenter is not adjusting the instance for the current workload. I have to do more test and experiments with karpenter, but until now is the thing that I've see. Thanks for the video and the channel Victor/DevOps Toolkit. Kind Regars.
@DevOpsToolkit Před 2 lety ⁺¹
Scaling down is a problem with all cluster autoscalers including Karpenter :(
@DaniVendettaXII Před 2 lety ⁺²
@@DevOpsToolkit Hi Victor, but, with cluster autoscaler, at least in my configuration, if replicas goes down, and then remaining pods can fit in other workers, the autoscaler evitc that pods, taint the node selected for be deleted, and the pods are re-schedulerd in a existing worker. After that, the worker is deleted, it's more slowly and less accurate. With cluster autoscaler it's easier to ha ve more resources that you need, but with karpenter I can see in scaling down , we have the same problem. Maybe it's resolved in the future, but I see some scopes where karpenter can be more useful than cluster autoscaler and viceversa.
Another point to take in consideration, is how to configure the providers, Since Karpenter is pretending to put all new pods in a workers, can have the probability of put all the pods in the same AZ, Today probable I'll trying combine providers with nodeaffinity and pod-ANTI-affinity to see if I can put pods in all my AZs.
Again thanks for that nice work, video and channel, and I really appreciate it your answer.
@DevOpsToolkit Před 2 lety ⁺¹
You're right. Karpenter solves some of the problems well while others are far from being solved. It's a new project so we're yet to see whether it will mature. My main concern right now, before other issues are solved, is whether other providers will pick it up and even whether AWS will include it into EKS. If neither of those happen, it's a sign that vendors do not trust it.
@bled_2033 Před 2 lety ⁺¹
Very well explained!
@ChrisShort Před 2 lety ⁺²
Thanks!
@javisartdesign Před 2 lety ⁺¹
Aswesome! Wanted to see a working example of its use. Thanks
@igorluizdesousasantos4965 Před 10 měsíci ⁺¹
Amazing content 🎉🎉
@jdiegosf Před 2 lety ⁺¹
Excellent!!!
@srivathsaharishvenk Před rokem ⁺²
legend!
@gdevelek Před rokem ⁺¹
You didn't explain that "limits:" thing at all. Why set a limit on total request CPU? And what if it's exceeded? No autoscaling???
@DevOpsToolkit Před rokem
You're right. CPU limits are arguably useless except for QoS.
@miletacekovic Před 2 lety ⁺²
One question: Is Karpenter capable of vertical auto-scaling down?
Typical example: Consider a new project started as a Monolith and one big pod is required for initial deployment. Karpenter allocates one big node to fit it. Now as project continues and grows, it is decomposed in Microservices and 10 small pods are used for full system. Is Karpenter capable of replacing a big node with say two much smaller nodes as that might be cheaper than the one big node?
@DevOpsToolkit Před 2 lety
Yes. It's doing that fairly well. Its main strength is that it creates nodes that are just the right size for the pending workload.
@miletacekovic Před 2 lety ⁺¹
@@DevOpsToolkit Wow, thanks for the answer!
@reddinghiphop1 Před rokem ⁺¹
fantastic
@maxmustermann9858 Před 8 měsíci ⁺¹
As I understand it only scales new nodes, is there also a way when I have a pod which gets utilized very heavily that a new node is created and the pod is moved to this node, for example when apps in pods can’t scale vertically by just adding more pods.
@DevOpsToolkit Před 8 měsíci
Why would you move a pod to a new node? If you specified memory, CPU, and other constraints, it should be irrelevant where that pod runs as long as those constraints are met.
@maxmustermann9858 Před 8 měsíci ⁺¹
@@DevOpsToolkit Ah I get it, so the recourses are statistically assigned and cannot dynamically grow with the Pods Load.
My assumption was that when there are no recourse limits defined and let’s say the pod normally runs with 2G of ram but now the load gets quite high the pod now needs 4G of Ram but the current system it’s running on can’t provide more so that the Pod won’t get „throttled“ or the application gets slow maybe there is a way that this pod gets restarted on another host which now has enough recourses.
@DevOpsToolkit Před 8 měsíci ⁺¹
@maxmustermann9858 when resource requests are not specified, pods can use any amount of memory and CPU available on the nodes they are running. However, when the collection of all the pods on a node consume more memory and CPU than the node has, pods without requests are kicked out first to leave resources for pods that do have it specified. So, pods without resource requests are considered less important and kubernetes will sacrifice them before others. Check out Quality of Service concept in kubernetes.
Also, kubernetes will soon release the feature of dynamic resource allocation so that resource requests can change without restarting pods. That will be especially useful with vertical pod scalers.
@RakeshKumar-eb9re Před 2 lety ⁺¹
To the point 👌
@bartekr5372 Před 2 lety ⁺¹
Nice. Let us consider cluster running hpa and cluster-autoscaler outside of peak hours. If you have a good distribution of pods and hpa starts to decrease the number of replicas you may end up having some nodes underutilized. Released capacity will occur on some of worker nodes. In such condition i always find cluster-autoscaler slow. Can we expect Karpenter to be more active or even doing some optimization? By optimization i mean compaction of unused capacity (something that deschedulers try to acchieve) or optimizing worker node sizes?
@DevOpsToolkit Před 2 lety ⁺¹
So far, I think that Karpenter is only marginally better at scaling down nodes that are underutilized. The part that works fairly well is when it scales up for a single pending pod and when that pod is removed, it removes the node almost instantly. That part looks very similar to what GKE Autopilot is doing. The project is still young so we'll see. It's better than Cluster Autoscaler in EKS but we're yet to see whether it will go beyond that (as it should).
@luisrodriguezgarcia1282 Před 2 lety ⁺¹
Have I understood correctly? This is just for EKS and not just EKS ... Just for EKS created with eksctl... What about the EKS clusters created with terraform? Can not be managed with karpenter?
Great video as usual Víctor, by the way.
@DevOpsToolkit Před 2 lety ⁺¹
You're partly right. Currently, Karpenter works only with EKS. The initial examples are using eksctl and Terraform examples were added recently. That, however, does not mean that it does not work with other tools. You should be able to use it with EKS clusters no matter which tool you're using to manage them.
A bigger problem is with other providers (e.g., GCPO, Azure, etc.). Karpenter project is hoping to attract contributions from others (currently it's mostly AWS folks), but that is yet to be seen.
@leonardo_oliveira241 Před 2 lety ⁺¹
@@DevOpsToolkit What about Fargate? In the documentation has a mention to work with Fargate.
@DevOpsToolkit Před 2 lety
@@leonardo_oliveira241 Fargate is EKS with a layer on top so it does work with it.
@bmutziu Před 2 lety ⁺²
Mulțumim!
@DevOpsToolkit Před 2 lety
Thanks a ton, Bogdan.
@bmutziu Před 2 lety ⁺¹
@@DevOpsToolkit It's nothing, Viktor.
@amitmantha7662 Před 9 měsíci ⁺¹
So as I installed karpenter on eks cluster, I just want to stop spinning the nodes by karpenter on every weekends automatically, how can I do that..??
@DevOpsToolkit Před 9 měsíci
I never had such a req so I never tried something like that.
Why not weekends? Does that mean that you prefer having pods in the pending state instead?
@sophiak4286 Před 7 měsíci ⁺¹
can we use karpenter for patching nodes in existing nodegroup. that is nodes not managed my karpenter
@DevOpsToolkit Před 7 měsíci
As far as I know, that is not possible.
@barefeg Před 2 lety ⁺¹
How would one have both cluster autoscaler and Karpenter running in the same cluster? Is it just using the special nodeSelector for karpenter to schedule those? I would like to try it out but without committing to it the whole way
@DevOpsToolkit Před 2 lety
I haven't tried using both so I'm not sure how it would work and what would need to be done to make that happen. I would rather experiment with it in a new temporary cluster.
@guangguang1984 Před rokem
Very nice video, thanks!
Got 1 questions: As there is no autoscaling group, how can I scale in nodes conveniently in mannual?
@DevOpsToolkit Před rokem ⁺¹
Your cluster is still created with a node group and you can always add additional node groups. It's just that those managed automatically are without a node group.
@georgeanastasiou2680 Před 2 lety ⁺¹
Hello Victor, thank you for your video, does it also consider multizone workloadss to instantiate nodes in multiple zones per region, something that as far as i know this is currently be accomodated by the upstream cluster-autoscaler project. Thank you
@DevOpsToolkit Před 2 lety ⁺¹
Yes. It does that :)
The main advantage of Karpenter is that you have much more control over the relation between pending pods and the nodes that should be created to run them.
@georgeanastasiou2680 Před 2 lety ⁺¹
@@DevOpsToolkit thank you :)
@sarvanvik1835 Před 2 lety ⁺¹
Hi sir,if we use karpenter ,if we want to upgrade the worker node to new version which is in node group,and also newly scaled groupless worker node,What would happen can u clear my doubt?
@DevOpsToolkit Před 2 lety
That's a bit "clanky" right now. You'd need to see TTL on the nodes so that they "expire" and be replaced by new nodes which will follow the version you have of your cluster.
The good news is that improvements for that are coming. You might want to follow github.com/aws/karpenter/issues/1738. You'll see over there that some additional options are already added while others are in progress.
@sparshagarwal1877 Před rokem ⁺¹
How to run karpenter on control plane ?
@DevOpsToolkit Před rokem
Not sure I understood the question. Are you asking how to run Karpenter pods in control plane nodes? If that's the case, you can't, at least when using managed kubernetes as EKS. You do not have write access to control plane nodes.
@snygg-johan9958 Před 2 lety ⁺¹
Very nice!
Does it also work with hpa during high loads?
@DevOpsToolkit Před 2 lety ⁺²
It does. HPA scales your apps and if some of the pods end up in the pending state Karpenter will scale up the cluster :)
@snygg-johan9958 Před 2 lety
@@DevOpsToolkit Thanks for the response! Then Im going to check it out :-)
@barefeg Před 2 lety ⁺¹
Do we need to have eksctl configurations for node groups at all?
@DevOpsToolkit Před 2 lety
You do need a node group for the cluster so that you get the initial node where you'll install karpenter. You do not have to use eksctl to create that group, but you do have to have it, even if it's for karpenter alone. That's why I complained in the video that it should run on control plane nodes.
@barefeg Před 2 lety ⁺¹
How do you track all of these new solutions that come up?
@DevOpsToolkit Před 2 lety ⁺¹
In some cases, I search for specific solutions that complement those I'm already using. In others, I hear about a tool and put it to my TODO list. In any case, I tend to spend a lot of time (including weekends and nights) on learning.
@srikrishnachaitanyagadde926 Před 2 lety ⁺¹
Are eks ipv6 clusters supported with karpenter?
@DevOpsToolkit Před 2 lety
There are issues with it (e.g., github.com/aws/karpenter/issues/1241).
@herbertpurpora9452 Před 2 lety
question: I'm new to kubernetes and aws. but base on my understanding, using karpenter will make our eks cluster cost change dynamically right?
@DevOpsToolkit Před 2 lety
Karpenter and similar cluster autoscaler solutions are adding servers when you need them and shitting them down when you don't. AWS, on the other hand, charges things you use. The more optimized usage is, the less you pay.
@shuc1935 Před 2 lety ⁺¹
Quick question out of curiosity: Since Karpernter auto scaling offering is group less , can we spin up a n eks cluster without nodegroup definition i.e. with zero worker node and based on the deployment resource requests , have Karpenter provision a group less node with appropriate capacity to run the requested application ?
@DevOpsToolkit Před 2 lety
That would be possible if Karpenter would be running on control plane nodes (like most of other cluster scalers are running). As it is now, it needs to run on worker nodes and that means that the cluster needs to have at least one where Karpenter will be running before it starts scaling up (and down).
@shuc1935 Před 2 lety ⁺¹
Never mind, you indeed mentioned that Karpenter can't be deployed on control plane nodes so in order to implement cluster auto scaling we must have at least one node in a node group which is kind of waste from a node group stand point but it's better than regular CA on EKS. I was curious to see if Karpenter could have been the solution for truly fully managed serverless k8s solution on AWS.
@DevOpsToolkit Před 2 lety ⁺¹
@@shuc1935 Managed Kubernetes services like EKS, GKE, AKS, etc. do not allow users to access control planes. That means that AWS would need to bake Karpenter into EKS itself. I hope they'll do that. Ideally, it should be a single checkbox asking people to enable Autoscaling which, currently, does not exist in EKS in any form without using Fargate.
@shuc1935 Před 2 lety ⁺¹
@@DevOpsToolkit yep, like gke auto pilot --enable auto scaling
@shuc1935 Před 2 lety ⁺¹
Also eks with with fargate profile is only partially fully managed based off think ahead of time namespace speculation
@gvoden Před 2 lety
Can I use Karpenter with my clusters that are leveraging managed node groups or do I have to get rid of the node groups first? How would the cluster upgrade process change if I use Karpenter? (I assume I can still do rolling updates regardless). And finally, should I be deploying Karpenter as a DaemonSet?
@DevOpsToolkit Před 2 lety ⁺²
Karpenter does not use managed node group which are essentially based on AWS auto-scaling groups (ASGs). It's intentionally avoiding ASGs because they are slow and because they are managing instances based on same instance types and sizes. Karpenter is avoiding it so that the process is (much) faster (ASG is slow) and so that it can create VMs with sizes that fit pending load. In other words, it's a good thing that it does not use ASGs. That being said, there is nothing preventing you from having a cluster based on managed node group. It's just that the nodes created by Karpenter will not be using it (it'll NOT use ASGs associated with managed node groups).
There should be no difference in the upgrade process. New nodes will be created based on the new version and the old nodes will be shut down (rolling updates).
There's no need to run Karpenter as DaemonSet. It's not the type of service that needs to run on each node of the cluster.
@TheApeMachine Před 2 lety ⁺¹
Make Karpenter and ArgoCD fight it out :p
@DevOpsToolkit Před 2 lety
Those are very different tools that serve different objectives, so the fight would not be fair. Karpenter could be compared to Cluster Autoscaler or, even better, EKS with Karpenter could be compared with GKE Autopilot.
@TheApeMachine Před 2 lety ⁺¹
@@DevOpsToolkit Not compare, fight. Karpenter tries to change the cluster, ArgoCD fights for consistency with state in git :)
@MichaelBushey Před 2 lety
@@TheApeMachine They won't fight at all. If the cluster does not have the resources the pods applied via ArgoCD will stay pending. ArgoCD will do it's job, the cluster just won't be able to run it all if it's not big enough.
@aswinkumar3396 Před 2 lety ⁺¹
Questions : when using karpenter with eks image is not pulling from private repository like sonartype nexus
@DevOpsToolkit Před 2 lety ⁺¹
Which image you're referring to? Image of carpenter itself or...?
@aswinkumar3396 Před 2 lety ⁺¹
Docker image of our python project which we have stored sonartype nexus
@aswinkumar3396 Před 2 lety ⁺¹
network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Error: ErrImagePull
@DevOpsToolkit Před 2 lety ⁺¹
@@aswinkumar3396 That's not related to scaling of the cluster. Karpenter will increase (or decrease) the nodes of the cluster allowing Kubernetes to schedule pending pods in the same way those would be scheduled without Karpenter.
@DevOpsToolkit Před 2 lety ⁺¹
@@aswinkumar3396 I think you might be facing the same issue as github.com/aws/karpenter/issues/1391
@sebastiansMcuProjekte Před 2 lety ⁺¹
Didn't even include a link and my prior comment got removed. Check out spot, maybe it's worth a video?
@DevOpsToolkit Před 2 lety
CZcams has a nasty tendency to remove comments without any obvious reason. Can you please send me the idea over Twitter (@vfarcic) or LinkedIn (www.linkedin.com/in/viktorfarcic/)?
@kiotetheone Před 2 lety ⁺¹
Thanks!
@DevOpsToolkit Před 2 lety
Thanks a ton!

Další v pořadí

Automatické přehrávání