New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EndpointSlice Controllers should use multiple slices for Services with many ports #99382
Comments
/sig network |
interesting, is this limit present on the normal Endpoints controller too? |
No, this is unique to the EndpointSlice API. We were trying to ensure there were no unbounded lists in the EndpointSlice API, but Service does have an unbounded list here. I didn't properly handle that in the controller. The simplest solution would be to loosen the EndpointSlice API validation but I don't think that's something we want to do. I'm guessing that having >100 ports on a Service is relatively uncommon, but it is still technically something Services support. I wonder if we could say that EndpointSlices would only track up to 100 ports for a Service and then drop anything beyond that? That number may be too low but it does seem like we need to have some kind of upper limit here unlike the unbounded list that is currently possible on Service. /cc @thockin |
I would rather just drop the validation. The port number is actually bounded to 0-65535 |
@thockin How do you feel about increasing the limit of ports to 65536 in EndpointSlice validation? |
The port number is bounded to 64k * 3 of possible protocols (3 today). In theory, you could craft a YAML that just fails - 192Ki endpoints, assume 32 bytes each * 100 Endpoints is > 4x the max size of a resource, not even counting the endpoints themselves Now, I am SURE that nobody is really trying to expose 192 thousand ports. But how many is enough? If we limit it to 1k, there will be someone doing VOIP who needs 4k. If we set a limit of 16K ports per slice, is that enough? 32K puts us at the 1MiB mark. 64K overflows a resource. If we get rid of the limit, what we end up with is non-deterministic failures. Can we pick a (seemingly) ridiculous number like 16K ports and then fix this issue (that the EPslice controller REALLY needs to add a dimension to re-slice on # of ports)? Do we think 16K is going to be enough? Do we have data to support or refute that? Will it be enough to get us through to a controller which slices ports? If we put no limit, we will not be able to easily go back on that, but it is compatible with Endpoints (sadly). |
16k covers all the current iana registered ports
|
Opened #99795 to increase limit to 16k. |
Although #99795 should make this issue mostly disappear in 1.21+, there is a chance that someone will want to deploy a Service with >20k ports and that will still break. Although I think this is unlikely, we have a few options to mitigate this:
|
@robscott should we close it now? |
I think it makes sense to close this out. I think #99795 should fix this issue for >99% of users. If anyone does run into this, feel free to reopen. /close |
@robscott: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
The EndpointSlice controllers are unaware of the limit on the number of EndpointSlice ports. When a Service has more than 100 ports, the EndpointSlice controllers try to create/update EndpointSlices as usual but fail.
What you expected to happen:
The EndpointSlice controller should probably have some support for subsetting EndpointSlices into groups of 100 ports in these instances. Alternatively we could remove the EndpointSlice validation for this but that would result in yet another unbounded list in k8s APIs.
How to reproduce it (as minimally and precisely as possible):
Create a Service with >100 ports.
/cc @swetharepakula @thockin
The text was updated successfully, but these errors were encountered: