Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support privileged container #391

Closed
yugui opened this issue Jul 10, 2014 · 44 comments
Closed

[Enhancement] Support privileged container #391

yugui opened this issue Jul 10, 2014 · 44 comments

Comments

@yugui
Copy link
Contributor

yugui commented Jul 10, 2014

Sometimes containers need to run with privileged mode.
https://docs.docker.com/reference/run/#runtime-privilege-and-lxc-configuration

Container manifest schema should support privileged flag so that users can deploy privileged container.

@thockin
Copy link
Member

thockin commented Jul 10, 2014

This is a dangerous opening. Can you detail why people should be able to
use this? I'd rather pursue teaching Docker how to enable various
privileges in a more granular fashion.

If anything, we should be convincing people to run containers with LESS
privs (i.e. not root to start).

On Thu, Jul 10, 2014 at 12:28 AM, Yuki Yugui Sonoda <
notifications@github.com> wrote:

Sometimes containers need to run with privileged mode.

https://docs.docker.com/reference/run/#runtime-privilege-and-lxc-configuration

Container manifest schema should support privileged flag so that users can
deploy privileged container.

Reply to this email directly or view it on GitHub
#391.

@bgrant0607
Copy link
Member

Docker is adding capability whitelisting/blacklisting as we speak.

On Thu, Jul 10, 2014 at 8:18 AM, Tim Hockin notifications@github.com
wrote:

This is a dangerous opening. Can you detail why people should be able to
use this? I'd rather pursue teaching Docker how to enable various
privileges in a more granular fashion.

If anything, we should be convincing people to run containers with LESS
privs (i.e. not root to start).

On Thu, Jul 10, 2014 at 12:28 AM, Yuki Yugui Sonoda <
notifications@github.com> wrote:

Sometimes containers need to run with privileged mode.

https://docs.docker.com/reference/run/#runtime-privilege-and-lxc-configuration

Container manifest schema should support privileged flag so that users
can
deploy privileged container.

Reply to this email directly or view it on GitHub
#391.

Reply to this email directly or view it on GitHub
#391 (comment)
.

@yugui
Copy link
Contributor Author

yugui commented Jul 11, 2014

I close this issue because I discussed with thockin, and I agreed that my original motivation of adding privileged container can be covered by another approach. On the other hand, container should be run with less priv.
So I agree that allowing privileged container is a wrong direction.

@yugui yugui closed this as completed Jul 11, 2014
@thockin
Copy link
Member

thockin commented Jul 11, 2014

If we can't find a better way to do this, we can revisit this idea.

On Thu, Jul 10, 2014 at 8:29 PM, Yuki Yugui Sonoda <notifications@github.com

wrote:

I close this issue because I discussed with thockin, and I agreed that my
original motivation of adding privileged container can be covered by
another approach. On the other hand, container should be run with less priv.
So I agree that allowing privileged container is a wrong direction.

Reply to this email directly or view it on GitHub
#391 (comment)
.

@thockin
Copy link
Member

thockin commented Aug 5, 2014

@danmace asked to reopen this

@thockin thockin reopened this Aug 5, 2014
@ironcladlou
Copy link
Contributor

The fact remains that Docker exposes privileged containers as part of its core API. I don't believe it's the responsibility of Kubernetes to shield API consumers from lower level capabilities in this regard. As a piece of infrastructure, how can Kube dictate what's secure or insecure in a given usage context? For instance, should Kube restrict host volume mounts in a container? How could it know what's secure given the overall deployment context (e.g. an on-premise deployment, a multi-tenant hosted solution)?

Because what's secure or not is defined by the administrator of the installation or product built on top of Kube, the capabilities of Docker should be available for use. Kube could delegate these decisions to pluggable strategies, or otherwise leave the input validation of a container manifest to a higher tier.

For a concrete use case of a privileged container, consider docker-in-docker support. For #662 we had to expose the Docker privileged container API through the Kube API in this manner.

@thockin
Copy link
Member

thockin commented Aug 5, 2014

Yes, it should restrict host volumes! If you look, you will find a TODO to this effect with my name on it.

What we typically do internally is grant a specific prod role a capability (which we do not have yet in k8s) indicating "can mount host dirs". The user job asserts that capability. The master checks the asserted caps against the allowed caps before scheduling the job with the enabled caps. The equivalent of kubelet knows then to allow host dirs.

The problem here is that privileged containers are really root equivalent. Allowing them is a giant hole (for something that shouldn't need one - building code!). Worse, it introduces the idea of exclusions. Some jobs MUST NOT run on machines where there are root-equiv jobs (think privacy, compliance, etc).

@ncdc
Copy link
Member

ncdc commented Aug 5, 2014

@thockin I understand where you're coming from, and agree with most of what you're saying, but I also think that it's reasonable to allow the cluster to be configured by an administrator to allow or disallow these things. Until docker supports non-root builds (if that's even possible), there's not really any other way to do builds using the cluster itself.

@pmorie
Copy link
Member

pmorie commented Aug 5, 2014

+1 @ncdc @ironcladlou

@rhatdan
Copy link

rhatdan commented Aug 6, 2014

Well from a security point of view --privileged versus --cap-add=all --label-opt=disabled is the same thing. --privileged is just well known. --privilege means to disable any attempt of separating a privilege process within the container from the host.

I actually argue right now that a privilege process within a container is the same as a priv process outside of a container, whether you run with --privileged or not. There are too many holes which we are slowly trying to close.

http://opensource.com/business/14/7/docker-security-selinux

Or if you have an hour to spare you could watch:

https://www.youtube.com/watch?v=zWGFqMuEHdw

We actually want to design in a host management containers (Super Privileged) which runs as with no security lockdown, and only uses the mnt namespace.

docker run --privilege -ti --net=host -v /:/host fedora /bin/sh

Gets you most of the way there. These type of containers would be for monitoring a system or debugging a system. With platforms like Project Atomic and Coreos. We will need these for some management taks. Imagine nagios monitoring a host, or a log aggregator container, that just talks to journald on the host and sends logging data to a central server.

I think what you will need is a RBAC Mechanism that will control who can modify the security parameters of a container. You might want to allow people to start/stop containers but not upload new containers or run them with any kind of "priv" ops, like --selinux-opt, --cap-add --privileged --volume ...

@thockin
Copy link
Member

thockin commented Aug 8, 2014

We run CAdvisor in a plain container (I think) with a couple HostDir
volumes.

I agree that privs inside a container is just a hair away from root. I
want to take the position with k8s that jobs almost never need root or
--privileged. But can do if needed, I guess. But I want to convince
people it is not needed, first.

On Wed, Aug 6, 2014 at 7:35 AM, rhatdan notifications@github.com wrote:

Well from a security point of view --privileged versus --cap-add=all
--label-opt=disabled is the same thing. --privileged is just well known.
--privilege means to disable any attempt of separating a privilege process
within the container from the host.

I actually argue right now that a privilege process within a container is
the same as a priv process outside of a container, whether you run with
--privileged or not. There are too many holes which we are slowly trying to
close.

http://opensource.com/business/14/7/docker-security-selinux

Or if you have an hour to spare you could watch:

https://www.youtube.com/watch?v=zWGFqMuEHdw

We actually want to design in a host management containers (Super
Privileged) which runs as with no security lockdown, and only uses the mnt
namespace.

docker run --privilege -ti --net=host -v /:/host fedora /bin/sh

Gets you most of the way there. These type of containers would be for
monitoring a system or debugging a system. With platforms like Project
Atomic and Coreos. We will need these for some management taks. Imagine
nagios monitoring a host, or a log aggregator container, that just talks to
journald on the host and sends logging data to a central server.

I think what you will need is a RBAC Mechanism that will control who can
modify the security parameters of a container. You might want to allow
people to start/stop containers but not upload new containers or run them
with any kind of "priv" ops, like --selinux-opt, --cap-add --privileged
--volume ...

Reply to this email directly or view it on GitHub
#391 (comment)
.

@ncdc
Copy link
Member

ncdc commented Aug 8, 2014

@thockin I'm wondering what your take is on using a pod to do a build, where the result is a new Docker image? That's the direction we'd like to follow, which means we need to be able to do either docker build or docker run && docker commit. Both methods need access to a Docker socket, and our thinking right now is to use a local socket that is either bind mounted from the host or created via Docker-in-Docker (which requires privileged mode).

@erictune
Copy link
Member

erictune commented Aug 8, 2014

Sorry if this is obvious, but why does it have to be build-in-docker-in-docker? Why can't it be build-in-docker?

@ncdc
Copy link
Member

ncdc commented Aug 8, 2014

@erictune we're hoping to apply resource constraints (cpu, memory, fs quota) and it will likely be easier to do this using docker-in-docker, where the initial container is managed by kubernetes, and whatever commands are executed to perform the build are scoped within that managed container's process tree and resource limits. Being under the docker-in-docker container's process tree is important to point out, as that means that processes spawned by the builder should live only as long as the docker-in-docker container is alive. If we're doing build-in-docker (i.e. using the host's docker socket in the build container), then I don't believe there's an easy way to apply resource constraints to the build instructions.

@thockin
Copy link
Member

thockin commented Aug 8, 2014

I think that building docker images is the obvious case for privileges.
It's INCREDIBLY unfortunate that docker requires privs here. it would be,
IMO, worth a lot of eng time to fix that somehow, or better contain it so
that only VERY TRUSTED code is running inside the privileged containers,
that ssh into those containers was not allowed, and so on.

Otherwise I see docker builds as poisoning machines against running
security-sensitive apps. We have a large number of apps internally that
would not be willing to share a machine with an arbitrary privileged
container.

On Fri, Aug 8, 2014 at 12:27 PM, Andy Goldstein notifications@github.com
wrote:

@erictune https://github.com/erictune we're hoping to apply resource
constraints (cpu, memory, fs quota) and it will likely be easier to do this
using docker-in-docker, where the initial container is managed by
kubernetes, and whatever commands are executed to perform the build are
scoped within that managed container's process tree and resource limits.
Being under the docker-in-docker container's process tree is important to
point out, as that means that processes spawned by the builder should live
only as long as the docker-in-docker container is alive. If we're doing
build-in-docker (i.e. using the host's docker socket in the build
container), then I don't believe there's an easy way to apply resource
constraints to the build instructions.

Reply to this email directly or view it on GitHub
#391 (comment)
.

@ncdc
Copy link
Member

ncdc commented Aug 8, 2014

@thockin that makes sense. I could conceive of a cluster in which a set of nodes was partitioned off from the rest for use a "build nodes," and hopefully they could be locked down more so than the other nodes to reduce the risk of a privilege escalation affecting the remainder of the cluster.

In the short term, are you open to adding the ability for a pod to specify that a container runs with privileges?

@erictune
Copy link
Member

erictune commented Aug 8, 2014

@ncdc
I think you are saying that you want to do this:

  1. start Build controller: POST /pods { ... "containers" : [{ "name": "build_controller" }] ... }
  2. build controller does:
    • docker run blah/gcc 'gcc foo.c -o /volumes/build1234/foo.o'
    • docker run blah/gcc 'gcc bar.c -o /volumes/build1234/bar.o'
    • and so on

It seems more Kubernetic to me to do this:

  1. start Build controller: POST /pods { ... "containers" : [{ "name": "build_controller" }] ... }
  2. build controller starts build step pods
    • POST /pods { ... "containers" : [{ "name": "build_step_1" , "image":"blah/gcc" , ... "COMMAND": "gcc foo.c -o /volumes/build1234/foo.o"}] ... }
    • POST /pods { ... "containers" : [{ "name": "build_step_2" , "image":"blah/gcc",... "COMMAND": "gcc bar.c -o /volumes/build1234/bar.o" }] ... }
    • and so on

The advantages I see of the latter form are:

  • no docker-in-docker required.

  • build controller pod does not have to be sized for the resources of the max sized step. It can potentially allocate resources which are sized appropriately for each step. This means less time-integrated resource usage.

  • a single build controller process can build with more parallelism than would be possible by restricting build to one minion. You could run multiple build controllers, though you pay controller overhead for each instance and have to do more coordination.

    Drawbacks to the latter form could be that it takes longer when considering time to setup each pod, and to communicate across machines. But, it seems like we could hide that with caching and smart scheduling.

@ncdc
Copy link
Member

ncdc commented Aug 8, 2014

@erictune that's an interesting approach and I need to think about it some more, but it's not what we've been prototyping. Our current POC is #662. This is our current approach:

  1. Build controller is a component similar to the replication controller manager
  2. POST /builds { "sourceUri": "some git repo", "imageTag": "desired_tag", … }
  3. Build controller sees the new build and starts builder pod
    • POST /pods { … }
  4. Builder pod runs a specialized "builder image" that knows how to build the sourceUri and create a new Docker image (which is pushed to a registry)
    • this is where the container needs access to Docker
  5. Build controller checks the status of the pod and marks the build as succeeded/failed depending on exit code of pod's containers

With what you're suggesting, how would you commit, tag, and push a new image?

@erictune
Copy link
Member

erictune commented Aug 8, 2014

Oh. So is it the "docker build" step that needs privileges? Is it because it needs to make a new namespace and the outer namespace has PR_SET_NO_NEW_PRIVS? If so, I don't know how to handle that.

@ncdc
Copy link
Member

ncdc commented Aug 8, 2014

The container in the pod needs access to Docker (assuming we're running docker build) and the 2 options we've considered are bind-mounting the minion's docker socket to the container, or running a separate instance of the docker daemon inside the container. It's the latter case (docker daemon in container) that requires privileges.

@thockin
Copy link
Member

thockin commented Aug 8, 2014

I think I'm more comfortable with docker-in-docker than accessing the host
docker. @erictune docker build does all the mounts/unmounts to set up and
teardown the filesystem stack.

On Fri, Aug 8, 2014 at 2:09 PM, Andy Goldstein notifications@github.com
wrote:

The container in the pod needs access to Docker (assuming we're running docker
build) and the 2 options we've considered are bind-mounting the minion's
docker socket to the container, or running a separate instance of the
docker daemon inside the container. It's the latter case (docker daemon in
container) that requires privileges.

Reply to this email directly or view it on GitHub
#391 (comment)
.

@smarterclayton
Copy link
Contributor

The advantages of docker-in-docker are that today you get true process inheritance from the client (if you kill the parent container, the running build goes away too) and cgroups are nested. The advantage of host docker is performance (you don't have to download and recreate multiple gig images on disk) and a simpler security model (don't need to give privileged, you just have to screen the Docker build commands from the API, although builds are still dangerous).

Another way privileged could be exposed is via a volume mount type plugin, although for docker-in-docker it's a bit of a stretch. For using host-docker it's a good model anyway since you need to bind a socket.

@smarterclayton
Copy link
Contributor

We had discussed modeling the "commit", "tag", and "push" operations as primitives on the host available via some apis. But even in that model you have the same security risks, and you decouple the container execution (the build) from the actual commit. So if a cleanup task is implemented on the Kubelet it suddenly needs to be aware that some build containers might be in a pending state and can't be deleted until an arbitrary future point.

ironcladlou added a commit to ironcladlou/kubernetes that referenced this issue Aug 12, 2014
Add a Privileged field to containers in a pod, in order to facilitate pods
performing administrative tasks such as builds via Docker-in-Docker.

Discussion: kubernetes#391
@rhatdan
Copy link

rhatdan commented Sep 3, 2014

On 09/02/2014 08:36 PM, Dan Mace wrote:

Something else to consider: Docker 1.2.0 features a granular
capabilities API
https://blog.docker.com/2014/08/announcing-docker-1-2-0/. I haven't
yet done any experimentation to see there exists a set of capabilities
which would allow Docker-in-Docker in a non-privileged container.

My previous rationale
#391 (comment)
for supporting |--privileged| applies here as well, but the
proliferation of flags to the kubelet and other components to toggle
these two features seems undesirable.

This would need to be a scheduling constraint, which is a class of
things we don't have yet.

This makes sense to me, but of course that's a longer road to go down.

Just wanted to get the thoughts out for discussion, if any is merited.


Reply to this email directly or view it on GitHub
#391 (comment).

Just because you can add a couple of capabilities less then --privileged
does not mean that it is not "privileged", you will have to add
sys_admin cap plus potentially others which allows you to mount/unmount
file systems, and makes it dirt simple to break out of the container.
You would almost assuredly have to turn off MAC (SELinux/AppArmor)
protection.

I don't see much bang for the buck here.

@brendandburns brendandburns added this to the v0.5 milestone Sep 11, 2014
brendandburns pushed a commit to brendandburns/kubernetes that referenced this issue Sep 12, 2014
Add a Privileged field to containers in a pod, in order to facilitate pods
performing administrative tasks such as builds via Docker-in-Docker.

Discussion: kubernetes#391
@brendandburns
Copy link
Contributor

Remaining items:

  • Add the enable flag to the apiserver too, and reject containers that are privileged, if privileged isn't active.
    • Add support in the cluster turn up scripts for enabling privilieged easily.

@brendandburns
Copy link
Contributor

At this point, this is fixed.

@sebgoa
Copy link
Contributor

sebgoa commented Apr 29, 2015

so how does this work ? I am trying to test a pod that runs a KVM vm in a container. I need the privileged mode or some ways to specify some devices that the pod can access (--device in docker run).

If I specify privileged: true in my pod.yaml I get an error:

<snip> is invalid: spec.containers[0].privileged: forbidden 'true'

How do I set the kubelet or apiserver to accept my pod ?

@smarterclayton
Copy link
Contributor

You need to pass --allow-privileged to your APIserver and also to your Kubelets.

----- Original Message -----

so how does this work ? I am trying to test a pod that runs a KVM vm in a
container. I need the privileged mode or some ways to specify some devices
that the pod can access (--device in docker run).

If I specify privileged: true in my pod.yaml I get an error:

<snip> is invalid: spec.containers[0].privileged: forbidden 'true'

How do I set the kubelet or apiserver to accept my pod ?


Reply to this email directly or view it on GitHub:
#391 (comment)

@sebgoa
Copy link
Contributor

sebgoa commented May 4, 2015

thanks that works. --allow_privileged=true

@r-medina
Copy link

r-medina commented Jun 1, 2015

how, exactly, does one pass --allow_privileged=true to the server and kublets?

@thockin
Copy link
Member

thockin commented Jun 1, 2015

Either edit the deployment files (salt on GCE, may be different on other
platforms) or log into each machine and edit the startup files
(/etc/default/{kubelet,kube-apiserver} on GCE Debian).

On Mon, Jun 1, 2015 at 9:24 AM, r-medina notifications@github.com wrote:

how, exactly, does one pass --allow_privileged=true to the server and
kublets?


Reply to this email directly or view it on GitHub
#391 (comment)
.

@sebgoa
Copy link
Contributor

sebgoa commented Jun 2, 2015

if you use hyperkube to start the apiserver and the kubelet you can do something like:

$ hyperkube apiserver --allow_privileged=true ...
$ hyperkube kubelet --allow_privileged=true ....

@erictune
Copy link
Member

erictune commented Jun 2, 2015

If you are using GCE or another distro that uses salt, you can edit
cluster/saltbase/pillar/privilege.sls

On Mon, Jun 1, 2015 at 6:27 PM, runseb notifications@github.com wrote:

if you use hyperkube to start the apiserver and the kubelet you can do
something like:

$ hyperkube apiserver --allow_privileged=true ...
$ hyperkube kubelet --allow_privileged=true ....


Reply to this email directly or view it on GitHub
#391 (comment)
.

@starsimpson
Copy link

I've edited my cluster/saltbase/pillar/privilege.sls files to allow privileged and restarted kubelet on all my nodes. Now what - is there a flag to pass when I run kubectl run / kubectl exec that confers the privileged status to my containers?

@erictune
Copy link
Member

You set pod.spec.securityContext.privileged=true

http://kubernetes.io/v1.0/docs/api-reference/definitions.html#_v1_securitycontext

On Wed, Jul 29, 2015 at 4:04 PM, Star Simpson notifications@github.com
wrote:

I've edited my cluster/saltbase/pillar/privilege.sls files to allow
privileged and restarted kubelet on all my nodes. Now what - is there a
flag to pass when I run kubectl run / kubectl exec that confers the
privileged status to my containers?


Reply to this email directly or view it on GitHub
#391 (comment)
.

@renewooller
Copy link

renewooller commented Sep 11, 2017

RE " log into each machine and edit the startup files
(/etc/default/{kubelet,kube-apiserver} on GCE Debian)"

Does anyone know of a less labour intensive approach? I've been investigating kops and kubeadm but can't seem to find anything yet.

If not, does anyone know the equivalent on AWS?

I've tried kops edit cluster and adding:

  kubeAPIServer:
    allowPrivileged: true

--UPDATE: this works, I had a simple indentation problem for privileged: true in the deployment itself.

seans3 pushed a commit to seans3/kubernetes that referenced this issue Apr 10, 2019
First draft of SIG Cluster Lifecycle related relnotes.
linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests