New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gce: KCM detaches all in-tree volumes during update from K8s 1.20 to 1.21 #109354
Comments
/sig storage |
It seems that we build the ASW based on the Node kubernetes/pkg/controller/volume/attachdetach/attach_detach_controller.go Lines 376 to 393 in a7a3274
In K8s 1.20.13 the volumeName for in-tree PV in the Node status looks like: volumesAttached:
- devicePath: ""
name: kubernetes.io/csi/pd.csi.storage.gke.io^projects/UNSPECIFIED/zones/UNSPECIFIED/disks/pv--21a16b25-edc8-4287-81dd-d4a71355a359 Note that the zone is The corresponding PV topology labels are: labels:
topology.kubernetes.io/region: europe-west1
topology.kubernetes.io/zone: europe-west1-b I see that this behaviour comes from Using While in K8s 1.21 #97823 introduces the following change: kubernetes/staging/src/k8s.io/csi-translation-lib/plugins/gce_pd.go Lines 220 to 222 in a7a3274
This results in a change in the volumeName (as the zone is part of the volumeName). The translated volumeName with 1.21 is:
Note that the zone is now set and no longer unspecified. And having one volumeName in ASW (coming from Node status) and different volumeName in the garbage collection logic for dangling VolumeAttachments for the same volume leads KCM to mark the volume as uncertain and detach the volume. @Jiawei0227 as author of #97823 I would appreciate if you confirm my observations. /priority critical-urgent |
/assign @Jiawei0227 |
/triage accepted |
I think @ialidzhikov you might be right, this seems to be a problem... |
Just to confirm, is csi migration enabled on the cluster? |
One thing to investigate is in what circumstances would the volume id zone be "UNSPECIFIED" on the node before the upgrade? |
Yes, the CSI migration is enabled. KCM@1.20.13 flags:
KCM@1.21.10 flags:
I guess for in-tree volumes the corresponding handling is kubernetes/staging/src/k8s.io/csi-translation-lib/plugins/gce_pd.go Lines 218 to 233 in 2444b33
If in K8s 1.20.13 the in-tree PV is missing the
|
we need to be sure that any fix/change here doesn't regress in the opposite direction (detaching volumes that currently work in 1.21+) |
@ialidzhikov can you help me understand this. So basically the issue only happens if you enable CSI migration in 1.20 and then upgrade to 1.21 because we did the topology fix from 1.20 to 1.21 But for example, if csi migration is enabled in 1.21 and further, and then when they do upgrade, it should not get impact, right? Because at that time the volumesAttached information is correct on that node. |
let's first figure out when will the issue occur and what's the impact range and then we can work on a fix. It might be a fix that only on release-1.21 branch possibly. |
Yes, the issue happens when you upgrade a 1.20 cluster with enabled CSI migration to 1.21.
I haven't checked what happens if you upgrade 1.20 cluster without CSI to 1.21 that enables CSI. However I cannot say we, from Gardener, are interested much in this case. In Gardener for GCP cluster we enable CSI migration starting from K8s 1.18 - ref https://github.com/gardener/gardener-extension-provider-gcp/blob/v1.22.0/docs/usage-as-end-user.md#csi-volume-provisioners.
When will the issue occur: when you upgrade a 1.20 GCP cluster with CSI enabled to 1.21 PS. It feels bad - I really put efforts into investigation of this issue. I believe I provided good steps to reproduce and tried to explain the issue. On the other side after 1 month I almost have no meaningful replies/actions from you, maintainers. On top a release team member (@palnabarun) makes a statement that 1.21.12 most probably will be the last release for 1.21 - ref https://kubernetes.slack.com/archives/CJH2GBF7Y/p1650480910742459?thread_ts=1649751203.028119&cid=CJH2GBF7Y. Will you really leave 1.21 affected by similar issue? PS2. A similar bad example from the history - ref https://kubernetes.slack.com/archives/CJH2GBF7Y/p1623453145304800. #102812 was not accepted and left range of patch versions for 1.18 affected by #102856. Back then I even created a security incident for this issue and the "reply" was that the issue is already disclosed due to my issue to k/k first and that they cannot do much for it. |
@ialidzhikov Hey Ismail, I apologize for the late response here. I have no excuse and we will try to fix this issue as soon as possible. |
Change-Id: I774a429442327a8700db975c625dd681f8577bc8
Hey @ialidzhikov , sorry for the late reply. We had some brainstorm meeting to figure out if there is some obvious solution or potential workaround/fix that we can go with to solve this, it is all documented in the doc linked below: https://docs.google.com/document/d/17iIoVj3g02U8Pgt4nC7g7sJG-Jd4TvPCpEXZ5e3oCk4/edit?usp=sharing Our recommendation is to follow Option 5 for now as none of the other solution might work or will cause some other problems. We will keep brainstorm solution for this one. Feel free to comment if you have other ideas. Sorry again for leading to this problem, we also have some action items internally to help improve the stability of the PD driver. We will keep posting on this thread if there is any other updates. |
Thank you all! I am impressed by the efforts you put! That's tremendously helpful for us as we are stuck in the 1.20 to 1.21 upgrade process because of this issue. In the last weeks I was also looking for mitigations/workarounds that prevent hitting this issue. Here is what I managed to find out as options:
|
@ialidzhikov thanks for sharing the detailed information. |
I think basically we need to make sure no customers are actually relying on these labels for their logic.
I think we should not re-add the label unless the volume is being detached from a node.
Label and nodeAffinity are two different thing. It should not impact each other.
Yes. And to make sure no new PVs created has this label, you need to use external-provisioner <=2.1.0. After the cluster is upgraded into 1.21(including nodes), it should not have this requirement, which means you can upgrade the external-provisioner to a higher version.
Yes, I think this is the safest approach so far to fix the issue. Sorry for all the trouble we have brought with this feature ;) |
/remove-priority critical-urgent Keeping this issue open for the 2 action items listed in https://docs.google.com/document/d/17iIoVj3g02U8Pgt4nC7g7sJG-Jd4TvPCpEXZ5e3oCk4/edit#heading=h.5n431nhh830a. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
I am closing this issue. The AI are about making the issue visible in release notes and in compatibility notes/matrix improvement in the external-provisioner/GCE PD CSI. /close |
@ialidzhikov: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened?
KCM wrongly detaches all in-tree volumes during update from K8s 1.20 to 1.21.
What did you expect to happen?
KCM to do not detach all in-tree volumes during update from K8s 1.20 to 1.21.
How can we reproduce it (as minimally and precisely as possible)?
Create a K8s 1.20.13 cluster.
Create in-tree and out-of-tree StorageClasses.
Create 3 StatefulSets with 4 replicas (1 Stateful set is using the out-of-tree, the other 2 - the in-tree):
Update the cluster to K8s 1.21.10.
Make sure that kube-controller-manager detaches all in-tree volumes during update
5.1 kube-controller-manager marks all in-tree volumes as uncertain.
5.2 6min after marking a lot of volume attachments as uncertain, KCM detaches the in-tree volumes.
FilesystemIsReadOnly
.The corresponding Pods fail with IO errors as their volumes are detached.
Anything else we need to know?
I see that in ASW the volume is reported as attached
Then the VA is marked as uncertain
Note the diff in the volume name
Kubernetes version
Update from K8s 1.20.13 to 1.21.10.
Cloud provider
OS version
Install tools
https://github.com/gardener/gardener
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
external-provisioner -
k8s.gcr.io/sig-storage/csi-provisioner@v2.1.2
external-attacher -
k8s.gcr.io/sig-storage/csi-attacher@v3.3.0
gcp-compute-persistent-disk-csi-driver -
gcr.io/gke-release/gcp-compute-persistent-disk-csi-driver@v1.3.4-gke.0
The text was updated successfully, but these errors were encountered: