Separate the pod template from replicationController #170

bgrant0607 · 2014-06-19T17:43:05Z

We should separate the pod template from replicationController, to make it possible to create pods from template without replicationController (e.g., for cron jobs, for deferred execution using hooks). This would also make updates cleaner.

bgrant0607 · 2014-07-09T22:13:05Z

If we were to remove replicationController from the core apiserver into a separate service, I'd leave the pod template in the core.

erictune · 2014-07-23T20:59:40Z

I'll take a shot at this.

@bgrant0607
@lavalamp
Please share any other thoughts on podTemplates.

Brian mentioned cron. This makes me think he wants to use podTemplates for delegation.
That is, is there some mechanism where principal A can define a /podTemplate, and then grant principal B permission to create /pods which derive from a certain /podTemplate, but which run as A. (I guess a replicationController is effectively "another principal" which A can delegate to?)

Will the delegation of power be part of the /podTemplate message, or will that be stored in some sideband ACL?
Does the PUT /pods method get extended to allow creating a /pod from a template instead of using the desiredState? Or is there a new non-REST method?

lavalamp · 2014-07-23T21:02:46Z

PUT /pods should only take pods, IMO. We should potentially offer something that "fills out" a pod template, but I think that should work like our current replication controller, which is an external component.

bgrant0607 · 2014-07-24T03:30:06Z

The point of this proposal is to further narrow the responsibility and API of the replicationController to its bare essentials. The replicationController should just spawn new replicas.

Right now, essentially a full copy of the pod API is embedded in the replicationController API. As an external, independently versioned API, it would be challenging to keep synchronized with the core API. Additionally, with the replicationController creating pods by value rather than by reference, it needs to be delegated the authority to create ~arbitrary pods as ~arbitrary users (once we support multi-tenancy) -- this would mean it could do anything as anybody. This is even more of an issue once we introduce an auto-scaler layered on the replicationController.

It's very difficult to develop a signature-based approach layered on a literal pod creation API that could be made both usable and secure. OTOH, if the replicationController could only spawn instances from templates owned by the core, then it's power could be restricted. Think of this as the principle of least privilege and separation of concerns.

Including the template by reference in the replicationController API would also facilitate rollbacks to previous pod configurations. A standalone template could be used for cron and other forms of deferred execution.

Even if we were to add a more general templating/configuration-generation mechanism in the future, we can't have turtles all the way down. A pod template would be useful for spawning the config generator, among other things.

As with the current replicationController API, pods would have no relationship to the template from which they were generated other than their labels and any other provenance information we kept. Changes to the template would have no effect on pods created from it previously.

I'd be fine with a separate API endpoint for creating pods from a template, just as we have for replicationController today.

erictune · 2014-07-24T14:48:02Z

Okay, putting together above comments and my own thoughts...

For the initial PR, we just need to have a pod template type which unambiguously and completely defines a /pod. Later PRs can extend /podTemplate as needed to support authorization of delegated use.

Here are a few examples with delegation and the types of expansion of templates that might occur:
- cron service runs a /pod, but passes to pod's environment a string identifying the datecode for this run.
- third-party auto-scaler makes more of a pod, or makes pods that request more or less resources.
- map-reduce service makes several pods, setting environment variables that control input and output file paths.
- ABTester service makes pods with two different values for an Environment variable that controls a new feature, and two different values of another Environment variable that controls a tag added to the logs of these pods (e.g. experiment_27354_mode_a)

Considerations for podTemplate:

Ease and succinctness of definition of podTemplate
Ease of reasoning about the security implications of giving a user permission to instantiate pods from a podTemplate
Work with YAML as well as json.
allow templates to generate many different kinds of pods.
Item 4 seems much less important than 1 and 2. Therefore, this rules out a podTemplate which holds a schema definition, or jpath expression, or anything else which allows fully general manipulation of json-type data. Item 3 above further reinforces this.

Therefore, a /podTemplate will look something like this:

{ "id": "awesomePodTemplate",
  "pod": "<object exactly following /pod schema>", 
   "allowExtraEnvVars": [
     "MOTD": 
       "Today's pod brought to you by a replication controller."],
   "allowModifiedResourcesRequestsAndLimits": 1,
   "delegatedPodMakers": ["alice@example.com", "replicationcontroller@kubernetes.io"],
}

Note the specific, capability-like descriptions of allowed modifications to the /pod object.

However, the first PR will just have:

{ "id": "myPodTemplate",
  "pod": "<object exactly following /pod schema>", 
}

The /pod schema will get a new member "actsAsUser". This affects which user the pod acts as.
Initially, this will have no affect. As we add authentication (#443), the following authorization code can be added to the apiserver:

if authenticatedUser == request.pod.actAsUser { return auth.Authorized }
return auth.notAuthorized

In a later PRs, the /pod schema will be extended to have a "fromPodTemplateId" member which references the id of the /podTemplate that this /pod is modeled on. This adds an interesting twist: we can't use the user-provided name alone to identify the /podTemplate. We need to specify which user's namespace the name lies in. Maybe "actAsUser" identifies this or maybe we need a globally unique id for a podTemplate.

With that member added, the authorization check for creating a /pod would look like this:

if authenticatedUser == request.pod.actAsUser { return Authorized }
if auth.Can(authenticatedUser, auth.MakePodsFor, request.pod.actAsUser) {
    tpl := findPodTemplate(request.fromPodTemplateId)
    if tpl != nil {
      if tpl.Generates(request.pod) {
         return auth.Authorized
      }
    }
  }
}
return auth.NotAuthorized

erictune · 2014-07-24T15:58:45Z

Other use case: pod's port can come from a range, to allow duplicate pods on the same host. Would this go in the template?

lavalamp · 2014-07-24T17:24:34Z

I wonder if we should maybe add an "owner" field to the JSONBase, so that all objects in the system could have an owning user. If so, no need to specifically add that field to the PodTemplate.

In a later PRs, the /pod schema will be extended to have a "fromPodTemplateId" member which references the id of the /podTemplate that this /pod is modeled on. This adds an interesting twist: we can't use the user-provided name alone to identify the /podTemplate.

This could be done with a label, which is what our current replicationController does.

I think a step that should come shortly after adding PodTemplate as a resource is changing the replication controller struct to take a podTemplateID instead of a hardcoded PodTemplate.

Port shouldn't be dynamic.

May want @brendanburns to take a look at this when he gets back.

erictune · 2014-07-24T17:37:14Z

On Thu, Jul 24, 2014 at 10:24 AM, Daniel Smith notifications@github.com
wrote:

I wonder if we should maybe add an "owner" field to the JSONBase, so that
all objects in the system could have an owning user. If so, no need to
specifically add that field to the PodTemplate.

In a later PRs, the /pod schema will be extended to have a
"fromPodTemplateId" member which references the id of the /podTemplate that
this /pod is modeled on. This adds an interesting twist: we can't use the
user-provided name alone to identify the /podTemplate.

This could be done with a label, which is what our current
replicationController does.

can a label selector select a different user's objects?

I think a step that should come shortly after adding PodTemplate as a
resource is changing the replication controller struct to take a
podTemplateID instead of a hardcoded PodTemplate.

Okay, but again the namespace/user issue is unresolved.

Port shouldn't be dynamic.

May want @brendanburns https://github.com/brendanburns to take a look
at this when he gets back.

—
Reply to this email directly or view it on GitHub
#170 (comment)
.

bgrant0607 · 2014-07-25T06:38:59Z

Thanks, @erictune .

First of all, while security is part of the motivation for this, I'd drop all user / identity / auth / delegation stuff until we figure out auth[nz] more generally. That said, we'll want to namespace label keys by project implicitly by default to prevent conflicts and overlap across users.

Second, we should leave out most/all forms of substitution and computation more generally. A more general config mechanism is a separate issue. I was thinking of take what replicationController supports today and moving it to a separate object, which we might want to garbage collect after some amount of time in the case that it hasn't been used.

However, I think it's not too early to think about the override model, and whether we want one eventually, even though we wouldn't implement it initially. Env. vars. and resources are good examples.

It would be useful to think about how splitting out the template (and overrides) would interact with updates driven by declarative configuration. Does the replicationController change to a new template, or does one update its template? How does one update pods controlled by the replicationController? Some ideas were discussed in #492 .

Duplicate pods on the same host: We implement IP per pod, so no port allocation range is necessary.

fromPodTemplateId: The template must behave as a cookie cutter -- once a pod is created from a template, it has no relationship to the template. The template may be changed or deleted without affecting pods created from it, and pods created from it may be modified independently. We probably do want to record provenance information for debugging and/or auditing, though. It would include information like the template id, time, replication controller id (if created by one), user, etc.

smarterclayton · 2014-07-29T17:57:29Z

@bgrant0607 Can you describe config generators a bit more as mentioned in 146? Haven't heard you mention that yet, but I suspect it matches use cases we are looking to solve as well.

bgrant0607 · 2014-08-28T18:25:59Z

Created #1007 to start the broader config discussion.

#503 contains another example that could use the pod template: job controller.

I'd like a bulk-creation operation to go with the pod template, so that a replication controller could send one operation to create N pods. This will eventually be important for performance, gang scheduling, usage analytics, etc.

bgrant0607 · 2014-09-19T21:30:52Z

@smarterclayton @erictune @lavalamp

Trying to make this concrete.

Standalone Pod Template

From #1225:

type PodTemplate struct {
    ObjectType `json:",inline" yaml:",inline"`
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodSpec `json:"spec,omitempty" yaml:"spec,omitempty"`
}

It should also have a Status PodTemplateStatus, for consistency with all other API objects. I could imagine recording status data like timestamp of last pod created (e.g., if we wanted to put a TTL on template objects).

There is the question of whether we want metadata like labels and annotations to come from the template or to be provided at pod instantiation time or both. Taking metadata from the template is easier to use and more secure. Providing metadata at instantiation time would allow more flexible template reuse. I'm going to declare that flexible template is a problem for the config system, not the PodTemplate, so I recommend we should take metadata from the PodTemplate.

The Metadata in the struct above is the PodTemplate's metadata. Typically, the pods created from the template will have the same labels and at least some of the same annotations, but may have additional annotations, such as to record the template from which they were created. However, for cleanliness (and flexibility, also), I recommend a separate field for pod metadata, PodMetadata ObjectMetadata.

I could also foresee us adding more fields in the future, such as authorization info, TTL, etc.

Therefore, I propose a PodTemplateSpec, which includes PodMetadata, PodSpec, and whatever other desired state fields we need.

type PodTemplateSpec struct {
        // Metadata of the pods created from this template.
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodSpec `json:"spec,omitempty" yaml:"spec,omitempty"`
}

type PodTemplate struct {
    ObjectType `json:",inline" yaml:",inline"`
    Metadata   ObjectMetadata `json:"metadata,omitempty" yaml:"metadata,omitempty"`

    // Spec describes what a pod should look like.
    Spec PodTemplateSpec `json:"spec,omitempty" yaml:"spec,omitempty"`

    // Status represents the current information about a PodTemplate. 
    Status PodTemplateStatus `json:"status,omitempty" yaml:"status,omitempty"`
}

Bulk Pod Creation

By themselves, PodTemplates don't do anything. They are there to be used. I propose to extend POST /pods with 2 URL parameters:

number=<int>: The number of pods to create. When number > 1, the Name, if provided, is treated as a prefix, to which some uniquifying characters are appended for each pod. If Name is not provided, it is auto-generated according to our general approach to this (e.g., autosetName=true might be necessary).
template=<reference to PodTemplate>: Take the pod's metadata and spec from the specified PodTemplate rather than from the json payload. If the specified PodTemplate doesn't exist, that's an error. The client should be able to use resourceVersion preconditions to ensure they're using a sufficiently up-to-date PodTemplate.

More about the format of object references below.

Replication Controller

Currently (even in the v1beta3 proposal), ReplicationControllerSpec contains an inline Template PodTemplate. It's awkward to nest a full-fledged object in another object, so at minimum this should be PodTemplateSpec instead.

There are 3 alternative approaches to using a PodTemplate in ReplicationController:

Use a POST URL parameter template=<reference to PodTemplate>, similar to pod creation, which would be copied into the PodTemplateSpec in the ReplicationControllerSpec at creation time.
Replace the inline PodTemplateSpec with a reference to a PodTemplate in the ReplicationControllerSpec.
Support both (i.e., one of) the inline PodTemplateSpec and reference to a PodTemplate, the former for simplicity and the latter for all the other reasons we'd like to do this. We could also support (1) in this case.

In order to produce the decoupling and security properties I was looking for when I proposed this issue, the replication controller service needs to be able to utilize the template at pod creation time rather than at the time the replication controller is created. Therefore, the ReplicationControllerSpec needs a reference to the PodTemplate. This has the disadvantage of creating a hard dependency between two objects -- the replication controller could fail if its pod template were deleted -- but we could disallow deletion of PodTemplates that were in use. We already have another creation-order dependency -- services must be created before their clients -- so that wouldn't be a new issue.

I'm tempted to recommend (3), so we could support both simple and more sophisticated use cases, but (A) I'm concerned that inline PodTemplateSpecs in ReplicationControllerSpec will create problems down the road for auth and for API refactoring and (B) kubecfg could paper over the complexity of dealing with multiple objects for now and a Real Config solution or higher-level API should be able to deal with it later.

So, I recommend (2): PodTemplate by reference only.

Inter-object references

This could (and probably should) be forked into its own issue if there's a lot of debate.

We don't currently have any cross-references between objects in our API. We just have indirect references via label selectors.

Possible options:

Label selector.
UID.
JSON of identifying metadata: Kind, Namespace, Name.
Partial object URL (e.g., path only, or path only without version).
Full object URL.
All of the above.
Something else?

Using label selectors would require adding a unique label to facilitate unique references, which is sort of contrary to what labels are for, or a non-label tie-breaking field to select the correct one from the set. Additionally, the consistency model would be more complex -- after adding a new template, users would want to ensure that the replication controller would use it before performing an action that would cause new pods to be created, such as killing pods or increasing the replica count. This seems overly complex for only a small benefit.

UID has the problems that it isn't even indexed currently, would be hard for users to reason about, and couldn't be specified without additional communication with the apiserver and processing in the client. In particular, it would be hostile to configuration.

JSON would require another encoding for URL parameters.

Therefore, I suggest consistency with API object references from outside the system, so either (4) or (5). The reason to not use (5) is because the domain name and version are not necessarily stable (esp. if we replicate and/or self-host apiserver), so I recommend (4), path without API version. This form would be used both in URL parameters (pod creation from template) and in object fields (replication controller). This form also happens to be the most concise.

smarterclayton · 2014-09-24T01:23:15Z

Re: references, I agree with reasoning about 1, 2, 3, and 5. I would also agree with 4 as better than the alternatives. One problem with 4 is when we rename resources (minions -> nodes).

Re: templates, Some potential problems that could crop up with referenced pods:

How does the replication controller validate that the provided pod template is safe if it can't read the template? We've got a few cases of that - label selector of controller needs to select the pods it creates, and RestartPolicy=always is the only allowed type. Does the replication controller need a new API endpoint of GET /pods that it can use as a validating oracle (return 1 if this attribute matches this label selector)?
If the problem is replication controllers being able to create arbitrary pods, couldn't we also address problem by having replication controller controllers (the code that creates pods) pass a reference to the controller resource (/resourceControllers/1), and have the /pods endpoint handle reading the template out of the replication controller?

Admittedly that means that the pods endpoint has to be able to decompose a resourceControllers response, and access it, which opens the door to other forms of injection style attacks. But it could also mean that any object which has a field "Spec" with type "PodTemplateSpec" can clone pods (assuming that we have a general solution for doing a reference -> endpoint GET, which I'm not positive we're at yet).

bgrant0607 · 2017-06-22T18:25:17Z

An alternative to fully moving the pod template out and replacing it with a reference to an external resource:

Create a pod template subresource for all the controllers (and even for CRDs eventually).

This would address the following use cases:

Create an API endpoint that could have different permissions
Create a uniform polymorphic API for pod template mutations, which could be used by kubectl set commands, for example
Facilitate rollouts of new ConfigMaps (Facilitate ConfigMap rollouts / management #22368) by making it easier to update references from controller pod templates
Facilitate rollouts upon PodPreset changes, by enabling it to target the pod template subresource rather than pods directly

@pmorie @pwittrock @erictune

bgrant0607 · 2017-06-30T15:47:15Z

If we had API support for apply (#17333), PodPreset could mutate the template spec without changing the user's intent.

fejta-bot · 2017-12-31T02:01:40Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

bgrant0607 · 2018-01-23T02:41:25Z

/remove-lifecycle stale
/lifecycle frozen

enisoc · 2018-04-05T16:03:53Z

I'm not sure if admission plugins were already discussed as part of the motivation for potentially making use of the top-level PodTemplate resource, but here is a concrete use case where such a design would have been helpful:

#61886 (comment)

backend/awsvpc: allow RBAC for instances

MadhavJivrajani · 2021-06-29T16:05:43Z

/remove-kind design
/kind feature

kind/design will soon be removed from k/k in favor of kind/feature. Relevant discussion can be found here: kubernetes/community#5641

bgrant0607 · 2021-11-08T15:35:52Z

This was the original design (sunit prototype = pod template, replicate = replica set):
https://twitter.com/bgrant0607/status/1121058263959654400

But it can be addressed through a combination of type imports and duck typing.

I don't think this will happen, so closing.
/close

k8s-ci-robot · 2021-11-08T15:36:14Z

@bgrant0607: Closing this issue.

In response to this:

This was the original design (sunit prototype = pod template, replicate = replica set):
https://twitter.com/bgrant0607/status/1121058263959654400

But it can be addressed through a combination of type imports and duck typing.

I don't think this will happen, so closing.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jbeda added the enhancement label Jun 19, 2014

bgrant0607 mentioned this issue Jul 9, 2014

Filter lists by label selector #387

Closed

bgrant0607 changed the title ~~Consider separating the pod template from replicationController~~ Separate the pod template from replicationController Jul 11, 2014

erictune mentioned this issue Jul 24, 2014

Kube-proxy services range of container ports #597

Closed

erictune self-assigned this Jul 24, 2014

bgrant0607 mentioned this issue Jul 25, 2014

Add content about replicationController to document use patterns #492

Closed

bgrant0607 mentioned this issue Jul 25, 2014

Higher level image and deployment concepts in Kubernetes #503

Closed

erictune mentioned this issue Jul 25, 2014

DNS #146

Closed

lavalamp mentioned this issue Jul 28, 2014

[WIP] Build resource prototype #662

Closed

erictune mentioned this issue Jul 29, 2014

Add initial resources.md for discussion #617

Merged

erictune mentioned this issue Jul 30, 2014

RFC: all controllers are separate web services. #703

Closed

bgrant0607 added this to the v1.0 milestone Aug 27, 2014

bgrant0607 mentioned this issue Sep 4, 2014

Proposal: Isolate kubelet from etcd #860

Merged

smarterclayton mentioned this issue Sep 9, 2014

Proposal: v1beta3 API overhaul #1225

Merged

bgrant0607 added area/api Indicates an issue on api area. kind/design Categorizes issue or PR as related to design. labels Sep 19, 2014

bgrant0607 mentioned this issue Feb 24, 2017

Need pattern for updating controllers that works with extensions (TPR/Aggregated) #41677

Open

bgrant0607 added the triaged label Mar 9, 2017

errordeveloper mentioned this issue Mar 9, 2017

should PodTemplatesGetter be part of public API #42789

Closed

lblackstone mentioned this issue Mar 15, 2017

checkpoint installer should source manifests from api-object kubernetes-retired/bootkube#206

Closed

KnVerey mentioned this issue Mar 16, 2017

Support Helm-style templating Shopify/krane#44

Closed

bgrant0607 mentioned this issue Mar 21, 2017

Workload API v1 requirements umbrella issue #42752

Closed

lukaszo mentioned this issue Jun 16, 2017

What's the intention to expose PodTemplate as an api object? #43552

Closed

bgrant0607 mentioned this issue Jun 30, 2017

Proposal: First-generation of kubectl set * commands #21648

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 31, 2017

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 23, 2018

kow3ns added this to Done in Workloads Mar 1, 2018

spiffxp removed the triaged label Mar 16, 2018

enisoc mentioned this issue Apr 5, 2018

DaemonSet scheduling is broken in combination with admission plugins #61886

Closed

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/enhancement labels Jun 5, 2018

bgrant0607 mentioned this issue Aug 21, 2019

Add new test for PodTemplates #81689

Closed

hh mentioned this issue Sep 6, 2019

Deprecate the PodTemplate API #82435

Closed

b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021

Merge pull request kubernetes#170 from grepory/awsvpc

caaf457

backend/awsvpc: allow RBAC for instances

k8s-ci-robot removed the kind/design Categorizes issue or PR as related to design. label Jun 29, 2021

k8s-ci-robot closed this as completed Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate the pod template from replicationController #170

Separate the pod template from replicationController #170

bgrant0607 commented Jun 19, 2014

bgrant0607 commented Jul 9, 2014

erictune commented Jul 23, 2014

lavalamp commented Jul 23, 2014

bgrant0607 commented Jul 24, 2014

erictune commented Jul 24, 2014

erictune commented Jul 24, 2014

lavalamp commented Jul 24, 2014

erictune commented Jul 24, 2014

bgrant0607 commented Jul 25, 2014

smarterclayton commented Jul 29, 2014

bgrant0607 commented Aug 28, 2014

bgrant0607 commented Sep 19, 2014

smarterclayton commented Sep 24, 2014

bgrant0607 commented Jun 22, 2017

bgrant0607 commented Jun 30, 2017

fejta-bot commented Dec 31, 2017

bgrant0607 commented Jan 23, 2018

enisoc commented Apr 5, 2018

MadhavJivrajani commented Jun 29, 2021

bgrant0607 commented Nov 8, 2021

k8s-ci-robot commented Nov 8, 2021

Separate the pod template from replicationController #170

Separate the pod template from replicationController #170

Comments

bgrant0607 commented Jun 19, 2014

bgrant0607 commented Jul 9, 2014

erictune commented Jul 23, 2014

lavalamp commented Jul 23, 2014

bgrant0607 commented Jul 24, 2014

erictune commented Jul 24, 2014

erictune commented Jul 24, 2014

lavalamp commented Jul 24, 2014

erictune commented Jul 24, 2014

bgrant0607 commented Jul 25, 2014

smarterclayton commented Jul 29, 2014

bgrant0607 commented Aug 28, 2014

bgrant0607 commented Sep 19, 2014

Standalone Pod Template

Bulk Pod Creation

Replication Controller

Inter-object references

smarterclayton commented Sep 24, 2014

bgrant0607 commented Jun 22, 2017

bgrant0607 commented Jun 30, 2017

fejta-bot commented Dec 31, 2017

bgrant0607 commented Jan 23, 2018

enisoc commented Apr 5, 2018

MadhavJivrajani commented Jun 29, 2021

bgrant0607 commented Nov 8, 2021

k8s-ci-robot commented Nov 8, 2021