Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: dynamic cluster addon resizer #13048

Closed
a-robinson opened this issue Aug 21, 2015 · 24 comments
Closed

Feature request: dynamic cluster addon resizer #13048

a-robinson opened this issue Aug 21, 2015 · 24 comments
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@a-robinson
Copy link
Contributor

People create clusters at a wide range of sizes. People then resize clusters to a wide range of sizes. Our cluster addons are configured statically, and do not respond to such changes in size. This causes problems in some cases:

  1. The default addons don't all fit on sufficiently small clusters (e.g. create a cluster with a single f1-micro on GCE, prepare to feel the pain)
  2. The default addons waste a large proportion of resources on small clusters -- does my two node cluster 300MiB for heapster? Will it ever need more than one DNS pod?
  3. The default addons don't scale properly to large clusters - heapster with 300MiB of memory isn't going to cut it on even some medium sized clusters, let alone clusters with a hundred nodes (example). More than one DNS pod will be useful for availability.

Some really simple control logic should be able to make the situation better by listing the nodes in the cluster and updating the addon RCs using a few basic rules. It could be part of the node controller, or be a very small container that runs on the master or even in the user's cluster (but it better be very small to justify its benefits).

@roberthbailey @zmerlynn

@a-robinson a-robinson added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Aug 21, 2015
@davidopp
Copy link
Member

It sounds like there are two problems
(1) We do not take cluster size into account when setting vertical and horizontal dimensions of the cluster addons at cluster creation time
(2) Even if we did (1), people might resize their cluster later, requiring us to resize the addons

I think (1) is more important than (2) right now, but (2) may become more important if people start using cluster autoscaling. (1) seems like something we should work on soon, while (2) could be deferred unless people are already doing a lot of manual cluster resizing.

@dchen1107 @yujuhong

@a-robinson
Copy link
Contributor Author

Yes, that's true.

Although on the other hand, solving (2) also solves (1), and isn't much more difficult if we're already coming with logic for determining at which size cutoffs we do different things.

@dchen1107
Copy link
Member

Yes, I brought this up to @roberthbailey and @zmerlynn before for 1.0 release, and pushed #7046 to 1.0 milestone. But there is no dynamic resizing. We need to address this issue soon to meeting our scalability goal.

@davidopp
Copy link
Member

@dchen1107 #7046 looks more like it's about upgrading binary version of an addon, not changing number of replicas or resource request. But I agree they are somewhat related.

@a-robinson It seems like solving (1) simpler, i.e. it could be done statically in the setup scripts, whereas (2) requires a continuously-running control loop. (Unless I am misunderstanding.)

@yujuhong
Copy link
Contributor

Even if we do scale the addons based on the cluster size, there'd be cases where they still don't fit in the cluster. We need a minimum requirement spec for a cluster.

@a-robinson
Copy link
Contributor Author

@davidopp your understanding matches mine, I think we just have different expectations around how much work is involved in putting the logic into a control loop.

@yujuhong I checked into this for GKE, so in case it helps with a more general spec, this is the current state of our addon / system component resource usage:
CPU:
DNS: 100m + 100m + 100m + 10m = 310m
UI: 100m
Heapster: 100m
Fluentd: 100m per node
Kubelet: ??? per node
Docker: ??? per node
Memory:
DNS: 50Mi + 50Mi + 50Mi + 20Mi = 170Mi
UI: 50Mi
Heapster: 300Mi
Fluentd: 200Mi per node
Kubelet: 70Mi per node (but not actually limited by a cgroup AFAIK)
Docker: 30Mi per node (but not actually limited by a cgroup AFAIK)

On GCE, a single g1-small can handle all of this with a not unreasonably tiny amount of room to spare, but f1-micros don't really work unless you have at least three of them. All larger instance types are fine.

@piosz
Copy link
Member

piosz commented Aug 27, 2015

@derekwaynecarr
Copy link
Member

cc @kubernetes/rh-cluster-infra @derekwaynecarr

@a-robinson
Copy link
Contributor Author

We don't necessarily need dynamic sizing, but I'm bumping this to p1 to at least have smarter sizing on startup since I've now personally had to help multiple customers having issues with this

@a-robinson a-robinson added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 1, 2015
@a-robinson
Copy link
Contributor Author

Specifically referring to heapster OOMing, that is

@a-robinson a-robinson added this to the v1.1-candidate milestone Oct 2, 2015
@bgrant0607-nocc bgrant0607-nocc modified the milestones: v1.1-candidate, v1.1 Oct 5, 2015
@bgrant0607
Copy link
Member

If this is for 1.1, it needs to be P0 at this point. Should it be?

@a-robinson
Copy link
Contributor Author

It isn't a true blocker for 1.1, so I'll remove it from the milestone. It would be a very nice-to-have to help out the many customers that have run into problems like kubernetes-retired/heapster#632, though.

I'm OOO most of the next couple weeks, but if anyone else has cycles to do something here, it'd be a nice addition to 1.1. Taking care of #15716 in the context of large clusters is probably enough.

@a-robinson a-robinson removed this from the v1.1 milestone Oct 15, 2015
@davidopp davidopp self-assigned this Oct 15, 2015
@davidopp
Copy link
Member

I'll assign it to myself to make sure we don't lose track of it. Will also add to v1.2-candidate

@davidopp davidopp added this to the v1.2-candidate milestone Oct 15, 2015
@piosz
Copy link
Member

piosz commented Oct 16, 2015

cc @marekbiskup

@davidopp
Copy link
Member

@alex-mohr

@davidopp davidopp modified the milestones: v1.2-candidate, v1.2 Oct 21, 2015
@davidopp
Copy link
Member

@brendandburns is going to implement (1) from
#13048 (comment)
for 1.1. But since this issue is titled "dynamic cluster addon resizer" let's leave this issue about that, and leave this issue for 1.2. Static cluster addon sizer can be covered by #15716

@davidopp
Copy link
Member

@mikedanese mentions this will be easier once the addons are managed by Deployment.

@davidopp davidopp added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Dec 15, 2015
@roberthbailey
Copy link
Contributor

@brendandburns and I discussed an alternative, which is to make (at least some of) the common system pods auto-size themselves. Heapster, for instance, is collecting metrics about the cluster, so it should have a pretty good idea about how many nodes / pods / etc exist and need to be monitored. It could decide that it needs to scale itself up/down, change it's pod definition, and then reschedule itself. This may not work as well if we move to a sharded model, but for the current singleton model it would allow us to create a solution for the addon that needs the most tuning without needing to solve the generic problem.

@vishh
Copy link
Contributor

vishh commented Dec 17, 2015

@roberthbailey: What if these system pods go pending after triggering a re-schedule event? There is currently no means to define priority.

@roberthbailey
Copy link
Contributor

True, that would be an issue.

@a-robinson
Copy link
Contributor Author

Is this still targeted for 1.2? Heapster being too small after a cluster has had more nodes added has been causing issues for customers.

@davidopp
Copy link
Member

No. This never appeared on any of the lists of features needed for 1.2. My bad for not removing the label; if this was giving the (rather reasonably-interpreted) impression that we were implementing it for 1.2, my apologies. Probably we need to make a pass over all the issues tagged as 1.2 and make sure the label syncs up with reality, as I think this may not be the only one that is out of sync.

@roberthbailey
Copy link
Contributor

@piosz - have we addressed this concern for heapster?

@piosz
Copy link
Member

piosz commented Apr 27, 2017

Yes, there is addon-resizer implemented by @Q-Lee.

@piosz piosz closed this as completed Apr 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests