Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust rollingupdate and rollback #1353

Closed
bgrant0607 opened this issue Sep 18, 2014 · 37 comments
Closed

Robust rollingupdate and rollback #1353

bgrant0607 opened this issue Sep 18, 2014 · 37 comments
Labels
area/app-lifecycle area/kubectl priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@bgrant0607
Copy link
Member

In, PR #1325 we agreed rollingupdate should be factored out of kubecfg in the kubecfg overhaul.

#492 (comment) and PR #1007 discussed alternative approaches to rollingupdate.

What I'd recommend is that we implement a new rollingupdate library and a corresponding command-line wrapper. The library may be invoked by more general libraries/tools in the future.

The rolling update approach that should be used is that of creating a new replication controller with 1 replica, resizing the new (+1) and old (-1) controllers one by one, and then deleting the old controller once it reaches 0 replicas. Unlike the current approach, this predictably updates the set of pods regardless of unexpected failures.

The two replication controllers would need at least one differentiating label, which could use the image tag of the primary container of the pod, which is typically what motivates rolling updates.

It should be possible to apply the operation to services containing multiple release tracks, such as daily and weekly or canary and stable, by apply it to each track individually.

@alex-mohr
Copy link
Contributor

The approach described above would seem to work for stateless services and/or blue-green deployments, but I'm not sure how it would work for something like e.g. a redis master where the pod itself has storage it cares about.

Are stateful pods a use case we want to support? If so, they would seem to require some form of in-place update?

@KyleAMathews
Copy link
Contributor

I'd say yes with some sort of hook system to let people write custom migration scripts to support the transition say to the new Redis master.

@bgrant0607
Copy link
Member Author

@alex-mohr See #598 and #1515. You could also put the data into a PD.

@stp-ip
Copy link
Member

stp-ip commented Nov 21, 2014

The thing is we have to start somewhere and I would say we should start with a rolling update mechanism for stateless services only.

Statefull services have a lot of different needs. I would try to avoid making k8s into a workflow integrating system. The question is. Would it be possible to let containers handle it themselves? On a shutdown lock and flush database. On restart unlock database etc. It could be handled on a per container basis, but makes them a lot more complex.
Another idea, which only works for migrations in my opinion would be to spin up an intermediate pod.

database example:
-> let the container lock and flush database
-> start intermediate container to migrate database
-> remove intermediate container when finished (exit themself)
-> start new pod one container at a time
-> remove old pod one container at a time

This would be the easiest example to start with. I agree there are other ways to remove the complexity from the containers themselves. This could be being able to execute a script on each old container via docker exec for example, which basically kills them when the script was executed. Either when all are killed the migration script could get in (downtime...) or the new containers could already be spawned.

@bgrant0607
Copy link
Member Author

@KyleAMathews
Copy link
Contributor

@bgrant0607 looks like we'd need a pre-prestop though as SIGTERM is still sent to the container :)

@bgrant0607 bgrant0607 added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Dec 3, 2014
@bgrant0607
Copy link
Member Author

Copying the detailed design of rolling update from #3061:

Requirements, principles, assumptions, etc.:

  • As a matter of principle, any functionality available in kubectl should be available by library call and declaratively, as well as by command.
  • Rolling update should not mix configuration generation and orchestration of the update.
  • It needs to handle arbitrary (e.g., multi-container) pods.
  • Durable data objects are assumed to have independent lifetimes of the pods themselves.
  • If the client fails or is killed in the middle of the rolling update, there needs to be a reasonable way to recover and complete or rollback the rollout.
  • Users should be able to specify the update rate on the command line, especially given that we don't yet check pod health or readiness. If it just blazes through the pods, it might as well not be a rolling update.

Proposed syntax:

kubectl rollingupdate <old-controller-name>  -f <new-replication-controller.json>

If the number of replicas specified in the new template were unspecified, I think it would default to 0 after parsing. Behavior in this case would be that we gradually increase it to the replica count of the original. We could also allow the user to specify a new size, which would do a rolling update for min(old,new) and then delete or add the remaining replicas.

Since kubectl can reason about the type of the json parameter, we could add support for specifying just the pod template later (like when we have config generation and v1beta3 sorted), or for other types of controllers (e.g., per-node controller).

Regarding recovery in the case of failure in the middle (i.e., resumption of the rolling update), and rollback of a partial rolling update: This is where annotations would be useful -- to store the final number of replicas desired, for instance. I think the identical syntax as above could work for recovery, if kubectl first checked whether the new replicationController already existed and, if so, merely continued. If we also supported just specifying a name for the new replication controller, that could also be used either to finish the rollingupdate or to rollback, by swapping old and new on the command line.

We should use an approach friendly to resizing, either via kubectl or via an auto-scaler. We should keep track of the number of replicas subtracted in an annotation on the original replication controller, so that the total desired is the current replica count plus the number subtracted. Unless the user specified the desired number explicitly in the new replication controller -- that can be stored in an annotation on the new replication controller.

I also eventually want to support "roll over" -- replace N replication controllers with one. I think that's straightforward if we just use the convention that the file or last name corresponds to the target replication controller, though it may be easier for users to shoot themselves in the foot. Perhaps a specific command-line argument to identify the new/target controller would therefore be useful.

kubectl rollingupdate old1 old2 ... oldN new
kubectl rollingupdate old1 old2 ... oldN -f new.json

This issue should not be closed until we support recovery and rollover.

@davidopp
Copy link
Member

davidopp commented Jan 7, 2015

I don't understand why you need annotations. Isn't the rolling update essentially stateless, in the sense that you can figure out where you left off and what remains to be done just by looking at the old and new replication controllers and the pods?

@bgrant0607
Copy link
Member Author

Almost, but not quite. The 2 replication controllers are not changed atomically, so the count could be off by one without keeping track. It's also the case that we allow the size to be changed by the rolling update.

@bgrant0607 bgrant0607 changed the title Robust rollingupdate Robust rollingupdate and rollback Jan 10, 2015
@bgrant0607
Copy link
Member Author

How to make rolling update friendly to auto-scaling is described here: #2863 (comment)

@bgrant0607 bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 5, 2015
@goltermann goltermann removed this from the v1.0 milestone Feb 6, 2015
@bgrant0607 bgrant0607 modified the milestone: v1.0 Feb 6, 2015
@bgrant0607
Copy link
Member Author

See also rollingupdate-related issues in the cli roadmap.

@bgrant0607
Copy link
Member Author

cc @kelseyhightower

@ghost
Copy link

ghost commented Apr 10, 2015

cc: quinton-hoole

@bgrant0607
Copy link
Member Author

I think the common scenarios are:

  • rolling update
  • rolling update, pause in middle, rollback, create a patched image, rolling update
  • create a canary, run it for a while, then perform a rolling update by growing the canary and shrinking the original RC
  • create a canary/experiment, run it for a while, delete it
  • rolling update, pause in middle, create a patched image, perform a rolling update to replace the original image and the botched image (aka rollover)
  • rolling update with a resize in middle (possibly without pausing, since the resize might be due to an auto-scaler)
  • run multiple "release tracks" continuously, which are updated independently

cc @rjnagal

@smarterclayton
Copy link
Contributor

Rolling update, at the end perform a migration automatically (post deployment step).

On Apr 10, 2015, at 3:56 PM, Brian Grant notifications@github.com wrote:

I think the common scenarios are:

rolling update
rolling update, pause in middle, rollback, create a patched image, rolling update
create a canary, run it for a while, then perform a rolling update by growing the canary and shrinking the original RC
create a canary/experiment, run it for a while, delete it
rolling update, pause in middle, create a patched image, perform a rolling update to replace the original image and the botched image (aka rollover)
rolling update with a resize in middle (possibly without pausing, since the resize might be due to an auto-scaler)
run multiple "release tracks" continuously, which are updated independently
cc @rjnagal


Reply to this email directly or view it on GitHub.

@bgrant0607
Copy link
Member Author

@smarterclayton Migration meaning traffic shifting?

@smarterclayton
Copy link
Contributor

No migration like schema upgrade (code version 2 rolled out, once code version 1 is gone, trigger the automatic DB schema update from schema 3->4)

@bgrant0607
Copy link
Member Author

Ah, this is the post-deployment hook. Got it.

@smarterclayton
Copy link
Contributor

Certainly doesn't have to be part of this, but if you think of deployment as a process of going from whatever the cluster previously had to something new, then many people may want to define the process (canaries, etc as you laid out). The process has to do a reasonable job of trying to converge, but it's acceptable to wedge and report being wedged due to unreconcilable differences (simple, like ports change, to complex, like image requires new security procedures). Since the assumption is that you're transforming between two steady states you either move forward, back, or stay stuck.

----- Original Message -----

Ah, this is the post-deployment hook. Got it.


Reply to this email directly or view it on GitHub:
#1353 (comment)

@bgrant0607
Copy link
Member Author

I agree that hooks seem useful, certainly in the case where deployments are triggered automatically. If we were to defer triggers, hooks probably could also be deferred.

@bgrant0607
Copy link
Member Author

Using #4140 for rollback/abort.

@bgrant0607 bgrant0607 modified the milestones: v1.0-post, v1.0 Apr 27, 2015
@bgrant0607 bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Apr 27, 2015
@bgrant0607
Copy link
Member Author

Updating from cli-roadmap.md:

Deployment (#1743) will need rollover (replace multiple replication controllers with one) in order to kick off new rollouts as soon as the pod template is updated.

We should think about whether we still want annotations on the underlying replication controllers and, if so, whether they need to be improved: #2863 (comment)

@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@bgrant0607 bgrant0607 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. area/kubectl and removed area/usability kind/gsoc priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jul 27, 2015
@bgrant0607
Copy link
Member Author

I believe we have more specific issues filed for remaining work, so I'm closing this "metaphysical" issue.

@huang195
Copy link
Contributor

huang195 commented Dec 7, 2015

@grant0607 is readiness probe and/or liveness probe taken into account of rolling update? What is the expected behavior when these probes are in good/bad conditions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle area/kubectl priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests