PetSet (was nominal services) #260

bgrant0607 · 2014-06-26T23:35:17Z

@smarterclayton raised this issue in #199: how should Kubernetes support non-load-balanced and/or stateful services? Specifically, Zookeeper was the example.

Zookeeper (or etcd) exhibits 3 common problems:

Identification of the instance(s) clients should contact
Identification of peers
Stateful instances

And it enables master election for other replicated services, which typically share the same problems, and probably need to advertise the elected master to clients.

bgrant0607 · 2014-06-27T23:24:25Z

Note that we should probably also rename service to lbservice or somesuch to distinguish them from other types of services.

bgrant0607 · 2014-07-09T22:11:19Z

As part of this, I'd remove service objects from the core apiserver and facilitate the use of other load balancers, such as HAProxy and nginx.

smarterclayton · 2014-07-09T22:32:28Z

It would be nice if the logical definition of a service (the query and/or global name) was able to be used/specialized in multiple ways - as a simple load balancer installed via the infrastructure, as a more feature complete load balancer like nginx or haproxy also offered by the infrastructure, as a queryable endpoint an integrator could poll/wait on (GET /services/foo -> { endpoints: [{host, port}, ...] }), or as information available to hosts to expose local load balancers. Obviously these could be multiple different use cases and as such split into their own resources, but having some flexibility to specify intent (unify under a lb) distinct from mechanism makes it easier to satisfy a wide range of reqts.

bgrant0607 · 2014-07-09T22:54:57Z

@smarterclayton I agree with separating policy and mechanism.

Primitives we need:

The ability to poll/watch a set identified by a label selector. Not sure if there is an issue filed yet.
The ability to query pod IP addresses (Make it possible to get the pod IP address via the API #385).

This would be enough to compose with other naming/discovery mechanisms and/or load balancers. We could then build a higher-level layer on top of the core that bundles common patterns with a simple API.

brendandburns · 2014-07-13T22:26:51Z

The two primitives described by @bgrant0607 is it worth keeping this issue open? Or are there more specific issues we can file?

smarterclayton · 2014-07-14T16:36:45Z

I don't think zookeeper is solved - since you need the unique identifier in each container. I think you could do this with 3 separate replication controllers (one per instance) or a mode on the replication controller.

smarterclayton · 2014-07-22T21:08:47Z

Service design I think deserves some discussion as Brian notes. Currently it couples an infrastructure abstraction (local proxy) with a mechanism for exposure (environment variables in all containers) with a label query. There is an equally valid use case for an edge proxy that takes L7 hosts/paths and balances them to a label query, as well as supporting protocols like http(s) and web sockets. In addition, services have a hard scale limit today of 60k backends, shared across the entire cluster (the amount of IPs allocated). It should be possible to run a local proxy on a minion that proxies only the services the containers on that host need, and also to avoid containers having to know about the external port. We can move this discussion to #494 if necessary.

bgrant0607 · 2014-10-02T16:54:24Z

Tackling the problem of singleton services and non-auto-scaled services with fixed replication, such as master-slave replicated databases, key-value stores with fixed-size peer groups (e.g., etcd, zookeeper), etc.

The fixed-replication cases require predictable array-like behavior. Peers need to be able to discover and individually address each other. These services generally have their own client libraries and/or protocols, so we don't need to solve the problem of determining which instance a client should connect to, other than to make the instances individually addressable.

Proposal: We should create a new flavor of service, called Cardinal services, which map N IP addresses instead of just one. Cardinal services would perform a stable assignment of these IP addresses to N instances targeted by their label selector (i.e., a specified N, not just however many targets happen to exist). Once we have DNS ( #1261, #146 ), it would assign predictable DNS names based on a provided prefix, with suffixes 0 to N-1. The assignments could be recorded in annotations or labels of the targeted pods.

This would preserve the decoupling of role assignment from the identities of pods and replication controllers, while providing stable names and IP addresses, which could be used in standard application configuration mechanisms.

Some of the discussion around different types of load balancing happened in the services v2 design: #1107.

I'll file a separate issue for master election.

/cc @smarterclayton @thockin

smarterclayton · 2014-10-02T17:17:01Z

The assignments would have to carry through into the pods via some environment parameterization mechanism (almost certainly).

For the etcd example, I would create:

replication controller cardinality 1: 1 pod, pointing to stable storage volume A
replication controller cardinality 2: 1 pod, pointing to stable storage volume B
replication controller cardinality 3: 1 pod, pointing to stable storage volume C
cardinal service 'etcd' pointing to the pods

If pod 2 dies, replication controller 2 creates a new copy of it and reattaches it to volume B. Cardinal service 'etcd' knows that that pod is new, but how does it know that it should be cardinality 2 (which comes from data stored on volume B)?

thockin · 2014-10-02T17:54:36Z

Rather than 3 replication controllers, why not a sharding controller, which
looks at a label like "kubernetes.io/ShardIndex" when making decisions. If
you want 3-way sharding, it makes 3 pods with indices 0, 1, 2. I feel like
this was shot down before, but I can't reconstruct the trouble it caused in
my head.

It just seems wrong to place that burden on users if this is a relatively
common scenario.

Do you think it matters if the nominal IP for a given pod changes due to
unrelated changes in the set? For example:

at time 0, pods (A, B, C) make up a cardinal service, with IP's
10.0.0.{1-3} respectively

at time 1, the node which hosts pod B dies

at time 2, the replication controller driving B creates a new pod D

at time 3, the cardinal service changes to (A, C, D) with IP's 10.0.0.{1-3}
respectively

NB: pod C's "stable IP" changed from 10.0.0.3 to 10.0.0.2 when the set
membership changed. I expect this will do bad things to running
connections.

To circumvent this, we would need to have the ordinal values specified
outside of the service, or something else clever. Maybe that is OK, but it
seems fragile and easy to get wrong if people have to deal with it.

On Thu, Oct 2, 2014 at 10:17 AM, Clayton Coleman notifications@github.com
wrote:

The assignments would have to carry through into the pods via some
environment parameterization mechanism (almost certainly).

For the etcd example, I would create:

replication controller cardinality 1: 1 pod, pointing to stable
storage volume A

replication controller cardinality 2: 1 pod, pointing to stable
storage volume B

replication controller cardinality 3: 1 pod, pointing to stable
storage volume C

cardinal service 'etcd' pointing to the pods

If pod 2 dies, replication controller 2 creates a new copy of it and
reattaches it to volume B. Cardinal service 'etcd' knows that that pod is
new, but how does it know that it should be cardinality 2 (which comes from
data stored on volume B)?

Reply to this email directly or view it on GitHub
#260 (comment)
.

smarterclayton · 2014-10-02T18:01:34Z

I think a sharding controller makes sense and is probably more useful in context of a cardinal service.

I do think that IP changes based on membership are scary and I can think of a bunch of degenerate edge cases. However, if the cardinality is stored with the pods, the decision is less difficult.

bgrant0607 · 2014-10-02T18:24:09Z

First of all, I didn't intend this to be about sharding -- that's #1064. Let's move sharding discussions to there. We've seen many cases of trying to use an analogous mechanism for sharding, and we concluded that it's not the best way to implement sharding.

bgrant0607 · 2014-10-02T18:26:25Z

Second, my intention is that it shouldn't be necessary to run N replication controllers. It should be possible to use only one, though the number required depends on deployment details (canaries, multiple release tracks, rolling updates, etc.).

bgrant0607 · 2014-10-02T18:28:43Z

Third, I agree we need to consider how this would interact with the durable data proposal (#1515) -- @erictune .

bgrant0607 · 2014-10-02T18:33:04Z

Four, I agree we probably need to reflect the identity into the pod. As per #386, ideally a standard mechanism would be used to make the IP and DNS name assignments visible to the pod. How would IP and host aliases normally be surfaced in Linux?

bgrant0607 · 2014-10-02T18:34:57Z

Fifth, I suggested that we ensure assignment stability by recording assignments in the pods via labels or annotations.

bprashanth · 2016-05-31T18:23:33Z

that's json. It's an alpha feature added to a GA object (init containers in pods).
@chrislovecnm is working on Cassandra, might just want to wait him out.

chrislovecnm · 2016-05-31T21:05:04Z

@paralin here is what I am working on. No time to document and get it into k8s repo now, but that is long term plan. https://github.com/k8s-for-greeks/gpmr/tree/master/pet-race-devops/k8s/cassandra Is working for me locally, on HEAD.

Latest C* image in the demo works well.

We do have issue open for more documentation. Wink wink, knudge @bprashanth

ingvagabund · 2016-06-30T08:25:37Z

PetSets example with etcd cluster [1].

[1] kubernetes-retired/contrib#1295

smarterclayton · 2016-06-30T21:59:36Z

Be sure to capture design asks on the proposal doc after you finish review

On Jun 30, 2016, at 1:25 AM, Jan Chaloupka notifications@github.com wrote:

PetSets example with etcd cluster [1].

[1] kubernetes-retired/contrib#1295
kubernetes-retired/contrib#1295

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#260 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABG_pwVgiaLvRKbtcJG9wzMEZcCNgae8ks5qQ32PgaJpZM4CIC6g
.

bprashanth · 2016-07-06T23:25:25Z

the petset docs are https://github.com/kubernetes/kubernetes.github.io/blob/release-1.3/docs/user-guide/petset.md and https://github.com/kubernetes/kubernetes.github.io/tree/release-1.3/docs/user-guide/petset/bootstrapping, I plan to close this issue and open a new one that addresses moving petset to beta unless anyone objects

bprashanth · 2016-07-08T23:40:36Z

#28718

Automatic merge from submit-queue Proposal for implementing nominal services AKA StatefulSets AKA The-Proposal-Formerly-Known-As-PetSets This is the draft proposal for #260.

Automatic merge from submit-queue Proposal for implementing nominal services AKA StatefulSets AKA The-Proposal-Formerly-Known-As-PetSets This is the draft proposal for kubernetes#260.

Change command setprefixname to setnameprefix

Bug fix where nil IP (byte slice) was dereferenced and caused the goroutine to hang. Fixes kubernetes#260 and kubernetes#267

This is a first step towards removing the mock CSI driver completely from e2e testing in favor of hostpath plugin. With the recent hostpath plugin changes(PR kubernetes#260, kubernetes#269), it supports all the features supported by the mock csi driver. Using hostpath-plugin for testing also covers CSI persistent feature usecases.

Update embargo doc link in SECURITY_CONTACTS and change PST to PSC

This was referenced Jul 9, 2014

Filter lists by label selector #387

Closed

Decide whether/how to extend the networking model #188

Closed

bgrant0607 added the design label Jul 17, 2014

smarterclayton mentioned this issue Jul 23, 2014

HTTP L7 load balancer / reverse proxy #561

Closed

smarterclayton mentioned this issue Aug 5, 2014

Discuss: allow services to send traffic to non-k8s-hosted targets? #768

Closed

smarterclayton mentioned this issue Aug 27, 2014

Reference implementation for sharded system #1064

Closed

bgrant0607 added area/downward-api sig/network Categorizes an issue or PR as relevant to SIG Network. kind/documentation Categorizes issue or PR as related to documentation. labels Sep 30, 2014

bgrant0607 mentioned this issue Oct 1, 2014

Add documentation of replication controller. #1527

Merged

bgrant0607 changed the title ~~Support/document how to run other types of services~~ Proposal: cardinal services Oct 2, 2014

bgrant0607 added the area/api Indicates an issue on api area. label Oct 2, 2014

bgrant0607 removed the kind/documentation Categorizes issue or PR as related to documentation. label Oct 2, 2014

bgrant0607 mentioned this issue Oct 2, 2014

Support master election #1542

Closed

bgrant0607 mentioned this issue Jun 3, 2016

PetSet Documentation kubernetes/website#510

Closed

matchstick added the kind/documentation Categorizes issue or PR as related to documentation. label Jun 15, 2016

JeanMertz mentioned this issue Jun 29, 2016

Kubernetes support citusdata/citus#425

Open

rosskukulinski mentioned this issue Jun 30, 2016

Add features required for smoother clustering in containerized / scheduled environments rethinkdb/rethinkdb#5897

Open

bprashanth mentioned this issue Jul 4, 2016

prod: mechanisms are needed to allow cockroach to be deployed in Kubernetes cockroachdb/cockroach#5967

Closed

bprashanth closed this as completed Jul 8, 2016

bprashanth mentioned this issue Jul 10, 2016

Unable to start monitoring-influxdb-grafana-v3-0 using petset #28591

Closed

bgrant0607 mentioned this issue Jul 13, 2016

Please consider changing the name of PetSet before General Availability #27430

Closed

bgrant0607 mentioned this issue Feb 22, 2019

add initial KEP for maxUnavailable in StatefulSets kubernetes/enhancements#678

Merged

wking pushed a commit to wking/kubernetes that referenced this issue Jul 21, 2020

Merge pull request kubernetes#260 from Liujingfang1/setnameprefix

4b5663a

Change command setprefixname to setnameprefix

b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021

vxlan: check of nil IP in fdb entry

0cde4dc

Bug fix where nil IP (byte slice) was dereferenced and caused the goroutine to hang. Fixes kubernetes#260 and kubernetes#267

avalluri mentioned this issue Apr 30, 2021

WIP: test/e2e/storage: replace mock driver with hostpath driver #101672

Closed

pjh pushed a commit to pjh/kubernetes that referenced this issue Jan 31, 2022

Update module go.uber.org/zap to v1.20.0 (kubernetes#260)

8027067

linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024

Merge pull request kubernetes#260 from joelsmith/secdoc

67c9baf

Update embargo doc link in SECURITY_CONTACTS and change PST to PSC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PetSet (was nominal services) #260

PetSet (was nominal services) #260

bgrant0607 commented Jun 26, 2014

bgrant0607 commented Jun 27, 2014

bgrant0607 commented Jul 9, 2014

smarterclayton commented Jul 9, 2014

bgrant0607 commented Jul 9, 2014

brendandburns commented Jul 13, 2014

smarterclayton commented Jul 14, 2014

smarterclayton commented Jul 22, 2014

bgrant0607 commented Oct 2, 2014

smarterclayton commented Oct 2, 2014

thockin commented Oct 2, 2014

smarterclayton commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bprashanth commented May 31, 2016

chrislovecnm commented May 31, 2016

ingvagabund commented Jun 30, 2016

smarterclayton commented Jun 30, 2016

bprashanth commented Jul 6, 2016 •

edited

bprashanth commented Jul 8, 2016

PetSet (was nominal services) #260

PetSet (was nominal services) #260

Comments

bgrant0607 commented Jun 26, 2014

bgrant0607 commented Jun 27, 2014

bgrant0607 commented Jul 9, 2014

smarterclayton commented Jul 9, 2014

bgrant0607 commented Jul 9, 2014

brendandburns commented Jul 13, 2014

smarterclayton commented Jul 14, 2014

smarterclayton commented Jul 22, 2014

bgrant0607 commented Oct 2, 2014

smarterclayton commented Oct 2, 2014

thockin commented Oct 2, 2014

smarterclayton commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bgrant0607 commented Oct 2, 2014

bprashanth commented May 31, 2016

chrislovecnm commented May 31, 2016

ingvagabund commented Jun 30, 2016

smarterclayton commented Jun 30, 2016

bprashanth commented Jul 6, 2016 • edited

bprashanth commented Jul 8, 2016

bprashanth commented Jul 6, 2016 •

edited