Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for managing revoked certs #18982

Open
stensonb opened this issue Dec 21, 2015 · 35 comments
Open

Support for managing revoked certs #18982

stensonb opened this issue Dec 21, 2015 · 35 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth.

Comments

@stensonb
Copy link
Contributor

Authenticating users via x509 certs is important, but the project seems to be missing a mechanism to revoke certs (without throwing the entire chain away and regenerating ALL certs for all users).

It would be great to be able to declare which certs are invalid, and have the kube-apiserver, kubelets, and all other cert-dependent services deny service for requests with the now-invalid cert.

https://en.wikipedia.org/wiki/OCSP_stapling appears to be one way to solve this.

FYI - this is the same idea as the issue with CoreOS: etcd-io/etcd#4034

@roberthbailey
Copy link
Contributor

@stensonb That is a good observation.

/cc @ncdc @stephenR

@ncdc
Copy link
Member

ncdc commented Jan 4, 2016

cc @liggitt

@bgrant0607-nocc bgrant0607-nocc added team/control-plane sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed team/control-plane labels Jan 11, 2016
@davidopp davidopp added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Jan 25, 2016
@erictune erictune added sig/auth Categorizes an issue or PR as relevant to SIG Auth. and removed area/security labels Apr 12, 2016
@jimmycuadra
Copy link
Contributor

Somewhat related, a better story around managing the lifecycle of TLS certs/keys in general. See https://github.com/kubernetes/kubernetes/issues/25379.

@tmegow
Copy link

tmegow commented Sep 20, 2017

+1

@liggitt liggitt added kind/feature Categorizes issue or PR as related to a new feature. and removed team/control-plane (deprecated - do not use) labels Sep 24, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 6, 2018
@livelyRyan
Copy link

+1

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 19, 2018
@kron4eg
Copy link

kron4eg commented Feb 19, 2018

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 19, 2018
@mikedanese mikedanese self-assigned this Mar 15, 2018
@mikedanese mikedanese added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Apr 4, 2018
@kaazoo
Copy link

kaazoo commented May 8, 2018

+1

@sbinev
Copy link

sbinev commented Jan 8, 2019

Ability to revoke (at least apiserver) certificates is paramount!

@rikatz
Copy link
Contributor

rikatz commented Jan 8, 2019

Hi,

I've been taking a look into API Server code.

It happens that it uses the tls.Config from Golang built-in Crypto/TLS package to Secure Serve the API endpoint. This package does a lot of the TLS dirty job, but one it doesn't is Revocation List verification.

Doing some research, I've faced this post from Adam Langley Blog. Mr. Langley is the crypto pkg maintainer.

The piece of code that does the basic certificate client validation is populating the Certificate Pool here with servingInfo.ClientCA.AddCert(cert), then adding it to the listener here with secureServer.TLSConfig.ClientCAs = s.ClientCA.

Following the Code you'll see that there's nothing into TLSConfig to deal also with CRL, so no option so far from here other then not doing this into authentication, but into authorization step.

Personally I don't think that's a good idea, as CRL by design is part of the authentication (you cannot use your driver license to identify as yourself if it's expired or revoked by the state as some kind of comparison).

OCSP response also seems a good idea, but the CA must have the additional OCSP endpoint field, and this might be a 'blocking' request, if you accept a bunch of CAs and one of them have the OCSP responder down, am I wrong?

Any ideas in how to achieve this inside Golang/Kubernetes API Server would be great.

EDIT: one idea that came into me, is that probably using a frontend HTTP proxy (NGINX, HAProxy, whatevet) that supports CRL and passing only the CN to Kubernetes would be an option to achieve this. I know K8S support this authentication schema, so the front proxy would make the Authentication and leave the Authorization to K8S.

@kron4eg
Copy link

kron4eg commented Jan 9, 2019

@sftim
Copy link
Contributor

sftim commented Jun 17, 2019

A few thoughts.

  1. OSCP stapling could work via periodic updates to CertificateSigningRequest objects. That way, parts of the cluster can use long-lived certificates and also let clients benefit from stapled OCSP responses.
    For example, stapling an OCSP response that says that the certificate is OK for the next 300 seconds.
    OCSP stapling is useful for clients outside the cluster (eg kubectl), if it's supported.
  2. A new CertificateRevocationRequest object could parallel CertificateSigningRequests. For example, a revocation-approval controller might accept any CertificateRevocationRequest that carries a signature made by the certificate being revoked.
  3. If CertificateRevocationRequest objects (or similar) are available via the API server, then the API server can reject client connections that are authenticated with any revoked certificate.
  4. A controller count track revocations and publish a revocation list (CRL) that components such as Kubelet can consume.
  5. Another controller can obtain signed, timestamped OCSP responses for certificates such as the API server's main certificate, and publish those via the Kubernetes API so that other components can staple those to their TLS sessions.

Overall it sounds possible and even worthwhile, but also quite a lot of work.

@oponder
Copy link

oponder commented Sep 16, 2019

Would it be super hacky to have a flag on the api server that allows us to reject certs created BEFORE a certain date?

That would be a way to quickly "revoke" all certs and start over when you have a feeling something went wrong.

@0xHazard
Copy link

Would it be super hacky to have a flag on the api server that allows us to reject certs created BEFORE a certain date?

That would be a way to quickly "revoke" all certs and start over when you have a feeling something went wrong.

So that means any time I want to revoke a cert I have to revoke all of them and sign another bunch? Doesn't sound sane for me.

@oponder
Copy link

oponder commented Mar 19, 2020

So that means any time I want to revoke a cert I have to revoke all of them and sign another bunch? Doesn't sound sane for me.

My use case is indeed to revoke all certs and start over, without having to change the CA or revoke an intermediate certificate, since that is a lot more work, or not directly in people's control.

@chris13524
Copy link

Alternative to revoking certificates, one could implement Webhook access control in a higher priority than Node or RBAC which would enable immediate denial of access as well as denial of later certificate renewal.

So in affect one can disable authorization but not authentication.

Are there any security concerns with having a valid certificate but no privileges?

@raesene
Copy link

raesene commented Feb 13, 2021

The is a potential concern with valid certificate and no explicit grants in RBAC, which is that the user will still get any rights assigned to the system:authenticated group. Whether that presents a problem, will likely depend on how the cluster operator manages RBAC in their environment, but it's not ideal.

Also there is one case (membership of system:masters) where access is provided regardless of any RBAC grants.

@enj enj added this to Backlog in SIG Auth Old Apr 9, 2021
@enj enj moved this from Needs Triage to Needs KEP in SIG Auth Old Jun 21, 2021
@enj
Copy link
Member

enj commented Jun 21, 2021

@deads2k are you planning on a KEP related to this in the v1.23 release?

@sferich888
Copy link

@deads2k are you planning on a KEP related to this in the v1.23 release?

v1.23 has come and past and is well on its way to retirement (targeted for End of Life:2023-02-28); does anyone have any plans to address this or propose a change that could be reviewed?

@remram44
Copy link

As far as I can see, this can be done the same way etcd did it, by overriding the TLS Listener with one that checks against a revocation list.

etcd's check: https://github.com/heyitsanthony/etcd/blob/41e26f741b26cc6f3faa39151ef74cfee3b6eace/pkg/transport/listener_tls.go#L62-L74

apiserver listener creation: https://github.com/kubernetes/apiserver/blob/10c70bebf96a41cfc1fb721c80a668d648cfeb0c/pkg/server/secure_serving.go#L250

Being able to pass a revocation list via an apiserver command-line option would already go a long way.

@XANi
Copy link

XANi commented Jan 8, 2024

Is there any chance to get even basic "Here is my CRL, use my CRL" included ? There appears to be half a dozen PR about exact same issue since 2016 all closed by indecisiveness.

"Here is CRL file" and "here is a signal for API server to re-read CRL" at least allows to have a solution for problem existing.

"Here is CRL file URL, just download it every X minutes or every crl.NextUpdate" would probably solve 99% of the problem as then we can just point it at whatever cert management solution the cluster is using

@sftim
Copy link
Contributor

sftim commented Jan 8, 2024

This issue is open; pull requests are welcome.

@remram44
Copy link

remram44 commented Jan 8, 2024

I sent a pull request, would love feedback: #122203

@enj
Copy link
Member

enj commented Jan 8, 2024

This issue is open; pull requests are welcome.

This isn't an issue that one can just make a PR to fix. It requires agreement on the path forward. The code itself was never the limiting factor.

@raesene
Copy link

raesene commented Jan 8, 2024

@enj do you know is there any documentation around previous discussions on options for looking at this that have been considered and decided against? I could see a challenge here for people coming at this fresh, that they won't have the perspective of the past 8 years to look back on.

I can recall some discussions in SIG-Auth slack, but I'm not sure if there's other docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth.
Projects
Status: Needs KEP
SIG Auth Old
Needs KEP
Development

Successfully merging a pull request may close this issue.