Job controller: Updates might override stale data #105199
Labels
kind/bug
Categorizes issue or PR as related to a bug.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
sig/apps
Categorizes an issue or PR as relevant to SIG Apps.
When investigating #105179, @liggitt and I discovered that the job controller does a GET request of the job before issuing any Job status update.
kubernetes/pkg/controller/job/job_controller.go
Lines 1357 to 1373 in 752c4b7
This is problematic because it can masquerade any incompatibilities between the job sync and the latest state of the Job. In particular, this can cause UIDs or counters to have stale data when tracking job status with finalizers.
It might not have been a problem in the past because the job controller would always recompute status from zero. However, when tracking with finalizers, the existing status is part of the input to the sync.
The solution is to skip the Job get and let the sync fail in case of conflict. The conflict implies that the Job is back in the workqueue because of its update.
/sig apps
The text was updated successfully, but these errors were encountered: