[Enhancement] Support privileged container #391

yugui · 2014-07-10T07:28:28Z

Sometimes containers need to run with privileged mode.
https://docs.docker.com/reference/run/#runtime-privilege-and-lxc-configuration

Container manifest schema should support privileged flag so that users can deploy privileged container.

thockin · 2014-07-10T15:18:25Z

This is a dangerous opening. Can you detail why people should be able to
use this? I'd rather pursue teaching Docker how to enable various
privileges in a more granular fashion.

If anything, we should be convincing people to run containers with LESS
privs (i.e. not root to start).

On Thu, Jul 10, 2014 at 12:28 AM, Yuki Yugui Sonoda <
notifications@github.com> wrote:

Sometimes containers need to run with privileged mode.

https://docs.docker.com/reference/run/#runtime-privilege-and-lxc-configuration

Container manifest schema should support privileged flag so that users can
deploy privileged container.

Reply to this email directly or view it on GitHub
#391.

bgrant0607 · 2014-07-11T01:32:10Z

Docker is adding capability whitelisting/blacklisting as we speak.

On Thu, Jul 10, 2014 at 8:18 AM, Tim Hockin notifications@github.com
wrote:

This is a dangerous opening. Can you detail why people should be able to
use this? I'd rather pursue teaching Docker how to enable various
privileges in a more granular fashion.

If anything, we should be convincing people to run containers with LESS
privs (i.e. not root to start).

On Thu, Jul 10, 2014 at 12:28 AM, Yuki Yugui Sonoda <
notifications@github.com> wrote:

Sometimes containers need to run with privileged mode.

https://docs.docker.com/reference/run/#runtime-privilege-and-lxc-configuration

Container manifest schema should support privileged flag so that users
can
deploy privileged container.

Reply to this email directly or view it on GitHub
#391.

Reply to this email directly or view it on GitHub
#391 (comment)
.

yugui · 2014-07-11T03:29:13Z

I close this issue because I discussed with thockin, and I agreed that my original motivation of adding privileged container can be covered by another approach. On the other hand, container should be run with less priv.
So I agree that allowing privileged container is a wrong direction.

thockin · 2014-07-11T03:32:00Z

If we can't find a better way to do this, we can revisit this idea.

On Thu, Jul 10, 2014 at 8:29 PM, Yuki Yugui Sonoda <notifications@github.com

wrote:

I close this issue because I discussed with thockin, and I agreed that my
original motivation of adding privileged container can be covered by
another approach. On the other hand, container should be run with less priv.
So I agree that allowing privileged container is a wrong direction.

Reply to this email directly or view it on GitHub
#391 (comment)
.

thockin · 2014-08-05T21:22:36Z

@danmace asked to reopen this

ironcladlou · 2014-08-05T21:31:20Z

The fact remains that Docker exposes privileged containers as part of its core API. I don't believe it's the responsibility of Kubernetes to shield API consumers from lower level capabilities in this regard. As a piece of infrastructure, how can Kube dictate what's secure or insecure in a given usage context? For instance, should Kube restrict host volume mounts in a container? How could it know what's secure given the overall deployment context (e.g. an on-premise deployment, a multi-tenant hosted solution)?

Because what's secure or not is defined by the administrator of the installation or product built on top of Kube, the capabilities of Docker should be available for use. Kube could delegate these decisions to pluggable strategies, or otherwise leave the input validation of a container manifest to a higher tier.

For a concrete use case of a privileged container, consider docker-in-docker support. For #662 we had to expose the Docker privileged container API through the Kube API in this manner.

thockin · 2014-08-05T21:38:05Z

Yes, it should restrict host volumes! If you look, you will find a TODO to this effect with my name on it.

What we typically do internally is grant a specific prod role a capability (which we do not have yet in k8s) indicating "can mount host dirs". The user job asserts that capability. The master checks the asserted caps against the allowed caps before scheduling the job with the enabled caps. The equivalent of kubelet knows then to allow host dirs.

The problem here is that privileged containers are really root equivalent. Allowing them is a giant hole (for something that shouldn't need one - building code!). Worse, it introduces the idea of exclusions. Some jobs MUST NOT run on machines where there are root-equiv jobs (think privacy, compliance, etc).

ncdc · 2014-08-05T21:41:53Z

@thockin I understand where you're coming from, and agree with most of what you're saying, but I also think that it's reasonable to allow the cluster to be configured by an administrator to allow or disallow these things. Until docker supports non-root builds (if that's even possible), there's not really any other way to do builds using the cluster itself.

pmorie · 2014-08-05T21:54:15Z

+1 @ncdc @ironcladlou

rhatdan · 2014-08-06T14:35:23Z

Well from a security point of view --privileged versus --cap-add=all --label-opt=disabled is the same thing. --privileged is just well known. --privilege means to disable any attempt of separating a privilege process within the container from the host.

I actually argue right now that a privilege process within a container is the same as a priv process outside of a container, whether you run with --privileged or not. There are too many holes which we are slowly trying to close.

http://opensource.com/business/14/7/docker-security-selinux

Or if you have an hour to spare you could watch:

https://www.youtube.com/watch?v=zWGFqMuEHdw

We actually want to design in a host management containers (Super Privileged) which runs as with no security lockdown, and only uses the mnt namespace.

docker run --privilege -ti --net=host -v /:/host fedora /bin/sh

Gets you most of the way there. These type of containers would be for monitoring a system or debugging a system. With platforms like Project Atomic and Coreos. We will need these for some management taks. Imagine nagios monitoring a host, or a log aggregator container, that just talks to journald on the host and sends logging data to a central server.

I think what you will need is a RBAC Mechanism that will control who can modify the security parameters of a container. You might want to allow people to start/stop containers but not upload new containers or run them with any kind of "priv" ops, like --selinux-opt, --cap-add --privileged --volume ...

thockin · 2014-08-08T05:23:24Z

We run CAdvisor in a plain container (I think) with a couple HostDir
volumes.

I agree that privs inside a container is just a hair away from root. I
want to take the position with k8s that jobs almost never need root or
--privileged. But can do if needed, I guess. But I want to convince
people it is not needed, first.

On Wed, Aug 6, 2014 at 7:35 AM, rhatdan notifications@github.com wrote:

Well from a security point of view --privileged versus --cap-add=all
--label-opt=disabled is the same thing. --privileged is just well known.
--privilege means to disable any attempt of separating a privilege process
within the container from the host.

I actually argue right now that a privilege process within a container is
the same as a priv process outside of a container, whether you run with
--privileged or not. There are too many holes which we are slowly trying to
close.

http://opensource.com/business/14/7/docker-security-selinux

Or if you have an hour to spare you could watch:

https://www.youtube.com/watch?v=zWGFqMuEHdw

We actually want to design in a host management containers (Super
Privileged) which runs as with no security lockdown, and only uses the mnt
namespace.

docker run --privilege -ti --net=host -v /:/host fedora /bin/sh

Gets you most of the way there. These type of containers would be for
monitoring a system or debugging a system. With platforms like Project
Atomic and Coreos. We will need these for some management taks. Imagine
nagios monitoring a host, or a log aggregator container, that just talks to
journald on the host and sends logging data to a central server.

I think what you will need is a RBAC Mechanism that will control who can
modify the security parameters of a container. You might want to allow
people to start/stop containers but not upload new containers or run them
with any kind of "priv" ops, like --selinux-opt, --cap-add --privileged
--volume ...

Reply to this email directly or view it on GitHub
#391 (comment)
.

ncdc · 2014-08-08T18:16:11Z

@thockin I'm wondering what your take is on using a pod to do a build, where the result is a new Docker image? That's the direction we'd like to follow, which means we need to be able to do either docker build or docker run && docker commit. Both methods need access to a Docker socket, and our thinking right now is to use a local socket that is either bind mounted from the host or created via Docker-in-Docker (which requires privileged mode).

erictune · 2014-08-08T19:23:36Z

Sorry if this is obvious, but why does it have to be build-in-docker-in-docker? Why can't it be build-in-docker?

ncdc · 2014-08-08T19:27:48Z

@erictune we're hoping to apply resource constraints (cpu, memory, fs quota) and it will likely be easier to do this using docker-in-docker, where the initial container is managed by kubernetes, and whatever commands are executed to perform the build are scoped within that managed container's process tree and resource limits. Being under the docker-in-docker container's process tree is important to point out, as that means that processes spawned by the builder should live only as long as the docker-in-docker container is alive. If we're doing build-in-docker (i.e. using the host's docker socket in the build container), then I don't believe there's an easy way to apply resource constraints to the build instructions.

thockin · 2014-08-08T20:28:30Z

I think that building docker images is the obvious case for privileges.
It's INCREDIBLY unfortunate that docker requires privs here. it would be,
IMO, worth a lot of eng time to fix that somehow, or better contain it so
that only VERY TRUSTED code is running inside the privileged containers,
that ssh into those containers was not allowed, and so on.

Otherwise I see docker builds as poisoning machines against running
security-sensitive apps. We have a large number of apps internally that
would not be willing to share a machine with an arbitrary privileged
container.

On Fri, Aug 8, 2014 at 12:27 PM, Andy Goldstein notifications@github.com
wrote:

@erictune https://github.com/erictune we're hoping to apply resource
constraints (cpu, memory, fs quota) and it will likely be easier to do this
using docker-in-docker, where the initial container is managed by
kubernetes, and whatever commands are executed to perform the build are
scoped within that managed container's process tree and resource limits.
Being under the docker-in-docker container's process tree is important to
point out, as that means that processes spawned by the builder should live
only as long as the docker-in-docker container is alive. If we're doing
build-in-docker (i.e. using the host's docker socket in the build
container), then I don't believe there's an easy way to apply resource
constraints to the build instructions.

Reply to this email directly or view it on GitHub
#391 (comment)
.

ncdc · 2014-08-08T20:30:58Z

@thockin that makes sense. I could conceive of a cluster in which a set of nodes was partitioned off from the rest for use a "build nodes," and hopefully they could be locked down more so than the other nodes to reduce the risk of a privilege escalation affecting the remainder of the cluster.

In the short term, are you open to adding the ability for a pod to specify that a container runs with privileges?

erictune · 2014-08-08T20:34:36Z

@ncdc
I think you are saying that you want to do this:

start Build controller: POST /pods { ... "containers" : [{ "name": "build_controller" }] ... }
build controller does:
- docker run blah/gcc 'gcc foo.c -o /volumes/build1234/foo.o'
- docker run blah/gcc 'gcc bar.c -o /volumes/build1234/bar.o'
- and so on

It seems more Kubernetic to me to do this:

start Build controller: POST /pods { ... "containers" : [{ "name": "build_controller" }] ... }
build controller starts build step pods
- POST /pods { ... "containers" : [{ "name": "build_step_1" , "image":"blah/gcc" , ... "COMMAND": "gcc foo.c -o /volumes/build1234/foo.o"}] ... }
- POST /pods { ... "containers" : [{ "name": "build_step_2" , "image":"blah/gcc",... "COMMAND": "gcc bar.c -o /volumes/build1234/bar.o" }] ... }
- and so on

The advantages I see of the latter form are:

no docker-in-docker required.
build controller pod does not have to be sized for the resources of the max sized step. It can potentially allocate resources which are sized appropriately for each step. This means less time-integrated resource usage.
a single build controller process can build with more parallelism than would be possible by restricting build to one minion. You could run multiple build controllers, though you pay controller overhead for each instance and have to do more coordination.

Drawbacks to the latter form could be that it takes longer when considering time to setup each pod, and to communicate across machines. But, it seems like we could hide that with caching and smart scheduling.

ncdc · 2014-08-08T20:43:09Z

@erictune that's an interesting approach and I need to think about it some more, but it's not what we've been prototyping. Our current POC is #662. This is our current approach:

Build controller is a component similar to the replication controller manager
POST /builds { "sourceUri": "some git repo", "imageTag": "desired_tag", … }
Build controller sees the new build and starts builder pod
- POST /pods { … }
Builder pod runs a specialized "builder image" that knows how to build the sourceUri and create a new Docker image (which is pushed to a registry)
- this is where the container needs access to Docker
Build controller checks the status of the pod and marks the build as succeeded/failed depending on exit code of pod's containers

With what you're suggesting, how would you commit, tag, and push a new image?

erictune · 2014-08-08T21:06:15Z

Oh. So is it the "docker build" step that needs privileges? Is it because it needs to make a new namespace and the outer namespace has PR_SET_NO_NEW_PRIVS? If so, I don't know how to handle that.

ncdc · 2014-08-08T21:09:11Z

The container in the pod needs access to Docker (assuming we're running docker build) and the 2 options we've considered are bind-mounting the minion's docker socket to the container, or running a separate instance of the docker daemon inside the container. It's the latter case (docker daemon in container) that requires privileges.

thockin · 2014-08-08T21:16:33Z

I think I'm more comfortable with docker-in-docker than accessing the host
docker. @erictune docker build does all the mounts/unmounts to set up and
teardown the filesystem stack.

On Fri, Aug 8, 2014 at 2:09 PM, Andy Goldstein notifications@github.com
wrote:

The container in the pod needs access to Docker (assuming we're running docker
build) and the 2 options we've considered are bind-mounting the minion's
docker socket to the container, or running a separate instance of the
docker daemon inside the container. It's the latter case (docker daemon in
container) that requires privileges.

Reply to this email directly or view it on GitHub
#391 (comment)
.

smarterclayton · 2014-08-11T04:28:44Z

The advantages of docker-in-docker are that today you get true process inheritance from the client (if you kill the parent container, the running build goes away too) and cgroups are nested. The advantage of host docker is performance (you don't have to download and recreate multiple gig images on disk) and a simpler security model (don't need to give privileged, you just have to screen the Docker build commands from the API, although builds are still dangerous).

Another way privileged could be exposed is via a volume mount type plugin, although for docker-in-docker it's a bit of a stretch. For using host-docker it's a good model anyway since you need to bind a socket.

smarterclayton · 2014-08-11T04:31:59Z

We had discussed modeling the "commit", "tag", and "push" operations as primitives on the host available via some apis. But even in that model you have the same security risks, and you decouple the container execution (the build) from the actual commit. So if a cleanup task is implemented on the Kubelet it suddenly needs to be aware that some build containers might be in a pending state and can't be deleted until an arbitrary future point.

Add a Privileged field to containers in a pod, in order to facilitate pods performing administrative tasks such as builds via Docker-in-Docker. Discussion: kubernetes#391

rhatdan · 2014-09-03T03:05:28Z

On 09/02/2014 08:36 PM, Dan Mace wrote:

Something else to consider: Docker 1.2.0 features a granular
capabilities API
https://blog.docker.com/2014/08/announcing-docker-1-2-0/. I haven't
yet done any experimentation to see there exists a set of capabilities
which would allow Docker-in-Docker in a non-privileged container.

My previous rationale
#391 (comment)
for supporting |--privileged| applies here as well, but the
proliferation of flags to the kubelet and other components to toggle
these two features seems undesirable.
This would need to be a scheduling constraint, which is a class of
things we don't have yet.
This makes sense to me, but of course that's a longer road to go down.

Just wanted to get the thoughts out for discussion, if any is merited.

—
Reply to this email directly or view it on GitHub
#391 (comment).

Just because you can add a couple of capabilities less then --privileged
does not mean that it is not "privileged", you will have to add
sys_admin cap plus potentially others which allows you to mount/unmount
file systems, and makes it dirt simple to break out of the container.
You would almost assuredly have to turn off MAC (SELinux/AppArmor)
protection.

I don't see much bang for the buck here.

Add a Privileged field to containers in a pod, in order to facilitate pods performing administrative tasks such as builds via Docker-in-Docker. Discussion: kubernetes#391

brendandburns · 2014-09-13T04:33:09Z

Remaining items:

Add the enable flag to the apiserver too, and reject containers that are privileged, if privileged isn't active.
- Add support in the cluster turn up scripts for enabling privilieged easily.

brendandburns · 2014-09-17T23:59:55Z

At this point, this is fixed.

sebgoa · 2015-04-29T19:12:37Z

so how does this work ? I am trying to test a pod that runs a KVM vm in a container. I need the privileged mode or some ways to specify some devices that the pod can access (--device in docker run).

If I specify privileged: true in my pod.yaml I get an error:

<snip> is invalid: spec.containers[0].privileged: forbidden 'true'

How do I set the kubelet or apiserver to accept my pod ?

smarterclayton · 2015-04-29T19:52:04Z

You need to pass --allow-privileged to your APIserver and also to your Kubelets.

----- Original Message -----

so how does this work ? I am trying to test a pod that runs a KVM vm in a
container. I need the privileged mode or some ways to specify some devices
that the pod can access (--device in docker run).

If I specify privileged: true in my pod.yaml I get an error:
<snip> is invalid: spec.containers[0].privileged: forbidden 'true'
How do I set the kubelet or apiserver to accept my pod ?

Reply to this email directly or view it on GitHub:
#391 (comment)

sebgoa · 2015-05-04T08:41:33Z

thanks that works. --allow_privileged=true

r-medina · 2015-06-01T16:23:49Z

how, exactly, does one pass --allow_privileged=true to the server and kublets?

thockin · 2015-06-01T16:32:47Z

Either edit the deployment files (salt on GCE, may be different on other
platforms) or log into each machine and edit the startup files
(/etc/default/{kubelet,kube-apiserver} on GCE Debian).

On Mon, Jun 1, 2015 at 9:24 AM, r-medina notifications@github.com wrote:

how, exactly, does one pass --allow_privileged=true to the server and
kublets?

—
Reply to this email directly or view it on GitHub
#391 (comment)
.

sebgoa · 2015-06-02T01:27:21Z

if you use hyperkube to start the apiserver and the kubelet you can do something like:

$ hyperkube apiserver --allow_privileged=true ...
$ hyperkube kubelet --allow_privileged=true ....

erictune · 2015-06-02T17:16:20Z

If you are using GCE or another distro that uses salt, you can edit
cluster/saltbase/pillar/privilege.sls

On Mon, Jun 1, 2015 at 6:27 PM, runseb notifications@github.com wrote:

if you use hyperkube to start the apiserver and the kubelet you can do
something like:

$ hyperkube apiserver --allow_privileged=true ...
$ hyperkube kubelet --allow_privileged=true ....

—
Reply to this email directly or view it on GitHub
#391 (comment)
.

starsimpson · 2015-07-29T23:03:23Z

I've edited my cluster/saltbase/pillar/privilege.sls files to allow privileged and restarted kubelet on all my nodes. Now what - is there a flag to pass when I run kubectl run / kubectl exec that confers the privileged status to my containers?

erictune · 2015-07-30T15:48:36Z

You set pod.spec.securityContext.privileged=true

http://kubernetes.io/v1.0/docs/api-reference/definitions.html#_v1_securitycontext

On Wed, Jul 29, 2015 at 4:04 PM, Star Simpson notifications@github.com
wrote:

I've edited my cluster/saltbase/pillar/privilege.sls files to allow
privileged and restarted kubelet on all my nodes. Now what - is there a
flag to pass when I run kubectl run / kubectl exec that confers the
privileged status to my containers?

—
Reply to this email directly or view it on GitHub
#391 (comment)
.

Log one error message per hour for failing containers.

renewooller · 2017-09-11T04:24:08Z

RE " log into each machine and edit the startup files
(/etc/default/{kubelet,kube-apiserver} on GCE Debian)"

Does anyone know of a less labour intensive approach? I've been investigating kops and kubeadm but can't seem to find anything yet.

If not, does anyone know the equivalent on AWS?

I've tried kops edit cluster and adding:

  kubeAPIServer:
    allowPrivileged: true

--UPDATE: this works, I had a simple indentation problem for privileged: true in the deployment itself.

First draft of SIG Cluster Lifecycle related relnotes.

Trim go.mod

yugui closed this as completed Jul 11, 2014

ncdc mentioned this issue Jul 31, 2014

[WIP] Build resource prototype #662

Closed

smarterclayton mentioned this issue Jul 31, 2014

Ability to set SELinux labels for volumes #699

Closed

thockin reopened this Aug 5, 2014

brendandburns added this to the v0.5 milestone Sep 11, 2014

brendandburns mentioned this issue Sep 11, 2014

Add support for privileged containers. #1288

Merged

brendandburns mentioned this issue Sep 17, 2014

Add a salt flag for enabling privileged containers. #1345

Merged

brendandburns closed this as completed Sep 17, 2014

curiousjit mentioned this issue Jul 8, 2015

Creating a privileged container #10919

Closed

sarnowski mentioned this issue Sep 7, 2015

Allow to run docker alongside docker via -v docker.sock zalando-stups/taupage#70

Closed

fgimenez mentioned this issue Jan 6, 2016

Added --allow-privileged flag to kubelet and apiserver init command for Juju provider #19326

Merged

vishh added a commit to vishh/kubernetes that referenced this issue Apr 6, 2016

Merge pull request kubernetes#391 from rjnagal/cpu

704db2b

Log one error message per hour for failing containers.

floreks mentioned this issue Apr 7, 2016

No possibility of deploying applications when "Run as privileged" is enabled kubernetes/dashboard#622

Closed

seans3 pushed a commit to seans3/kubernetes that referenced this issue Apr 10, 2019

Merge pull request kubernetes#391 from pipejakob/1.8-lifecycle

1e29516

First draft of SIG Cluster Lifecycle related relnotes.

dgerd mentioned this issue Jun 12, 2019

Does privileged container supported? knative/serving#4130

Closed

linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024

Merge pull request kubernetes#391 from xueweiz/trim

ef30a1f

Trim go.mod

[Enhancement] Support privileged container #391

[Enhancement] Support privileged container #391

Comments

yugui commented Jul 10, 2014

thockin commented Jul 10, 2014

bgrant0607 commented Jul 11, 2014

yugui commented Jul 11, 2014

thockin commented Jul 11, 2014

thockin commented Aug 5, 2014

ironcladlou commented Aug 5, 2014

thockin commented Aug 5, 2014

ncdc commented Aug 5, 2014

pmorie commented Aug 5, 2014

rhatdan commented Aug 6, 2014

thockin commented Aug 8, 2014

ncdc commented Aug 8, 2014

erictune commented Aug 8, 2014

ncdc commented Aug 8, 2014

thockin commented Aug 8, 2014

ncdc commented Aug 8, 2014

erictune commented Aug 8, 2014

ncdc commented Aug 8, 2014

erictune commented Aug 8, 2014

ncdc commented Aug 8, 2014

thockin commented Aug 8, 2014

smarterclayton commented Aug 11, 2014

smarterclayton commented Aug 11, 2014

rhatdan commented Sep 3, 2014

brendandburns commented Sep 13, 2014

brendandburns commented Sep 17, 2014

sebgoa commented Apr 29, 2015

smarterclayton commented Apr 29, 2015

sebgoa commented May 4, 2015

r-medina commented Jun 1, 2015

thockin commented Jun 1, 2015

sebgoa commented Jun 2, 2015

erictune commented Jun 2, 2015

starsimpson commented Jul 29, 2015

erictune commented Jul 30, 2015

renewooller commented Sep 11, 2017 • edited

renewooller commented Sep 11, 2017 •

edited