New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Support privileged container #391
Comments
This is a dangerous opening. Can you detail why people should be able to If anything, we should be convincing people to run containers with LESS On Thu, Jul 10, 2014 at 12:28 AM, Yuki Yugui Sonoda <
|
Docker is adding capability whitelisting/blacklisting as we speak. On Thu, Jul 10, 2014 at 8:18 AM, Tim Hockin notifications@github.com
|
I close this issue because I discussed with thockin, and I agreed that my original motivation of adding privileged container can be covered by another approach. On the other hand, container should be run with less priv. |
If we can't find a better way to do this, we can revisit this idea. On Thu, Jul 10, 2014 at 8:29 PM, Yuki Yugui Sonoda <notifications@github.com
|
@danmace asked to reopen this |
The fact remains that Docker exposes privileged containers as part of its core API. I don't believe it's the responsibility of Kubernetes to shield API consumers from lower level capabilities in this regard. As a piece of infrastructure, how can Kube dictate what's secure or insecure in a given usage context? For instance, should Kube restrict host volume mounts in a container? How could it know what's secure given the overall deployment context (e.g. an on-premise deployment, a multi-tenant hosted solution)? Because what's secure or not is defined by the administrator of the installation or product built on top of Kube, the capabilities of Docker should be available for use. Kube could delegate these decisions to pluggable strategies, or otherwise leave the input validation of a container manifest to a higher tier. For a concrete use case of a privileged container, consider docker-in-docker support. For #662 we had to expose the Docker privileged container API through the Kube API in this manner. |
Yes, it should restrict host volumes! If you look, you will find a TODO to this effect with my name on it. What we typically do internally is grant a specific prod role a capability (which we do not have yet in k8s) indicating "can mount host dirs". The user job asserts that capability. The master checks the asserted caps against the allowed caps before scheduling the job with the enabled caps. The equivalent of kubelet knows then to allow host dirs. The problem here is that privileged containers are really root equivalent. Allowing them is a giant hole (for something that shouldn't need one - building code!). Worse, it introduces the idea of exclusions. Some jobs MUST NOT run on machines where there are root-equiv jobs (think privacy, compliance, etc). |
@thockin I understand where you're coming from, and agree with most of what you're saying, but I also think that it's reasonable to allow the cluster to be configured by an administrator to allow or disallow these things. Until docker supports non-root builds (if that's even possible), there's not really any other way to do builds using the cluster itself. |
Well from a security point of view --privileged versus --cap-add=all --label-opt=disabled is the same thing. --privileged is just well known. --privilege means to disable any attempt of separating a privilege process within the container from the host. I actually argue right now that a privilege process within a container is the same as a priv process outside of a container, whether you run with --privileged or not. There are too many holes which we are slowly trying to close. http://opensource.com/business/14/7/docker-security-selinux Or if you have an hour to spare you could watch: https://www.youtube.com/watch?v=zWGFqMuEHdw We actually want to design in a host management containers (Super Privileged) which runs as with no security lockdown, and only uses the mnt namespace. docker run --privilege -ti --net=host -v /:/host fedora /bin/sh Gets you most of the way there. These type of containers would be for monitoring a system or debugging a system. With platforms like Project Atomic and Coreos. We will need these for some management taks. Imagine nagios monitoring a host, or a log aggregator container, that just talks to journald on the host and sends logging data to a central server. I think what you will need is a RBAC Mechanism that will control who can modify the security parameters of a container. You might want to allow people to start/stop containers but not upload new containers or run them with any kind of "priv" ops, like --selinux-opt, --cap-add --privileged --volume ... |
We run CAdvisor in a plain container (I think) with a couple HostDir I agree that privs inside a container is just a hair away from root. I On Wed, Aug 6, 2014 at 7:35 AM, rhatdan notifications@github.com wrote:
|
@thockin I'm wondering what your take is on using a pod to do a build, where the result is a new Docker image? That's the direction we'd like to follow, which means we need to be able to do either |
Sorry if this is obvious, but why does it have to be build-in-docker-in-docker? Why can't it be build-in-docker? |
@erictune we're hoping to apply resource constraints (cpu, memory, fs quota) and it will likely be easier to do this using docker-in-docker, where the initial container is managed by kubernetes, and whatever commands are executed to perform the build are scoped within that managed container's process tree and resource limits. Being under the docker-in-docker container's process tree is important to point out, as that means that processes spawned by the builder should live only as long as the docker-in-docker container is alive. If we're doing build-in-docker (i.e. using the host's docker socket in the build container), then I don't believe there's an easy way to apply resource constraints to the build instructions. |
I think that building docker images is the obvious case for privileges. Otherwise I see docker builds as poisoning machines against running On Fri, Aug 8, 2014 at 12:27 PM, Andy Goldstein notifications@github.com
|
@thockin that makes sense. I could conceive of a cluster in which a set of nodes was partitioned off from the rest for use a "build nodes," and hopefully they could be locked down more so than the other nodes to reduce the risk of a privilege escalation affecting the remainder of the cluster. In the short term, are you open to adding the ability for a pod to specify that a container runs with privileges? |
@ncdc
It seems more Kubernetic to me to do this:
The advantages I see of the latter form are:
|
@erictune that's an interesting approach and I need to think about it some more, but it's not what we've been prototyping. Our current POC is #662. This is our current approach:
With what you're suggesting, how would you commit, tag, and push a new image? |
Oh. So is it the "docker build" step that needs privileges? Is it because it needs to make a new namespace and the outer namespace has PR_SET_NO_NEW_PRIVS? If so, I don't know how to handle that. |
The container in the pod needs access to Docker (assuming we're running |
I think I'm more comfortable with docker-in-docker than accessing the host On Fri, Aug 8, 2014 at 2:09 PM, Andy Goldstein notifications@github.com
|
The advantages of docker-in-docker are that today you get true process inheritance from the client (if you kill the parent container, the running build goes away too) and cgroups are nested. The advantage of host docker is performance (you don't have to download and recreate multiple gig images on disk) and a simpler security model (don't need to give privileged, you just have to screen the Docker build commands from the API, although builds are still dangerous). Another way privileged could be exposed is via a volume mount type plugin, although for docker-in-docker it's a bit of a stretch. For using host-docker it's a good model anyway since you need to bind a socket. |
We had discussed modeling the "commit", "tag", and "push" operations as primitives on the host available via some apis. But even in that model you have the same security risks, and you decouple the container execution (the build) from the actual commit. So if a cleanup task is implemented on the Kubelet it suddenly needs to be aware that some build containers might be in a pending state and can't be deleted until an arbitrary future point. |
Add a Privileged field to containers in a pod, in order to facilitate pods performing administrative tasks such as builds via Docker-in-Docker. Discussion: kubernetes#391
On 09/02/2014 08:36 PM, Dan Mace wrote:
I don't see much bang for the buck here. |
Add a Privileged field to containers in a pod, in order to facilitate pods performing administrative tasks such as builds via Docker-in-Docker. Discussion: kubernetes#391
Remaining items:
|
At this point, this is fixed. |
so how does this work ? I am trying to test a pod that runs a KVM vm in a container. I need the privileged mode or some ways to specify some devices that the pod can access (--device in docker run). If I specify privileged: true in my pod.yaml I get an error:
How do I set the kubelet or apiserver to accept my pod ? |
You need to pass --allow-privileged to your APIserver and also to your Kubelets. ----- Original Message -----
|
thanks that works. --allow_privileged=true |
how, exactly, does one pass |
Either edit the deployment files (salt on GCE, may be different on other On Mon, Jun 1, 2015 at 9:24 AM, r-medina notifications@github.com wrote:
|
if you use hyperkube to start the apiserver and the kubelet you can do something like:
|
If you are using GCE or another distro that uses salt, you can edit On Mon, Jun 1, 2015 at 6:27 PM, runseb notifications@github.com wrote:
|
I've edited my cluster/saltbase/pillar/privilege.sls files to allow privileged and restarted kubelet on all my nodes. Now what - is there a flag to pass when I run kubectl run / kubectl exec that confers the privileged status to my containers? |
You set pod.spec.securityContext.privileged=true http://kubernetes.io/v1.0/docs/api-reference/definitions.html#_v1_securitycontext On Wed, Jul 29, 2015 at 4:04 PM, Star Simpson notifications@github.com
|
Log one error message per hour for failing containers.
RE " log into each machine and edit the startup files Does anyone know of a less labour intensive approach? I've been investigating kops and kubeadm but can't seem to find anything yet. If not, does anyone know the equivalent on AWS? I've tried
--UPDATE: this works, I had a simple indentation problem for privileged: true in the deployment itself. |
First draft of SIG Cluster Lifecycle related relnotes.
Trim go.mod
Sometimes containers need to run with privileged mode.
https://docs.docker.com/reference/run/#runtime-privilege-and-lxc-configuration
Container manifest schema should support privileged flag so that users can deploy privileged container.
The text was updated successfully, but these errors were encountered: