[lxc-devel] how to do not allow to mount /cgroup inside container?

Tue Aug 25 14:05:37 UTC 2009

Quoting Daniel Lezcano (daniel.lezcano at free.fr):
> Krzysztof Taraszka wrote:
>> Hi,
>>
>> I was looking for possibility to secure lxc container to do not allow 'root
>> container user'  from changing limits from cgroup. Right now without STACK64
>> or SELinux he can do this easily.
>> I read the http://www.ibm.com/developerworks/linux/library/l-lxc-security/cookbook
>> and decided to use STACK64 kernel mechanism.
>> Well... mounting cgroup inside container fails (great!, i am looked for that
>> ;)) but networking fails too (interface bring up, sshd bring up, connection
>> beetween host and container is, but 'mtr', 'ping' even 'apt-get update'
>> fails and I do not know why). I secure my container exactly like in the
>> cookbook.

Yeah, smack's use of cipso can make things tricky, and it's possible things
have changed a bit recently.  Although I'm currently running smack in my
everyday s390 kernel to test checkpointing of its labels, and networking
is working fine.

Can you give me a few details - what distro, smack policy, and precise kernel
version are you using, for starters?

>> Is there any other possilbility to have secure container without network
>> problems or any hint now to enable networking with stack64 enabled? If so,
>> maybe the l-lxc-security cookbook have to updated? Maybe another kernel
>> patch to do not allow container to mount cgroup when the mount call come
>> from container?
>>
>> Any ideas?
>>   
> I think Serge can help you on this area (Cc'ed).

Well the idea is that user namespaces will provide this.  The files in
the cgroupfs will be labeled as being owned by users in the initial
user namespace.  The users in the container would be in a child user
namespace, the namespaces being hierarchical, so for instance the
container might have been created by uid 500 in the initial namespace
(ns=1), with the new namespace being (ns=2).  Then uid 0 in the container
is actually (1:500,2:0) and uid 1000 in the container is (1:500,2:1000).
Now tasks in the container will only own files which are owned by 1:500,
and root tasks in the container can only get privileges (CAP_DAC_OVERRIDE
etc) to files owned by 1:500.  So regular DAC permissions then suffice to
prevent containers from messing with their cgroup constraints.

Unfortunately that is (still) a ways off.  If you have time to work
on that, I think the last time Eric and I discussed how to go about
introducing this functionality was in the thread starting here:
https://lists.linux-foundation.org/pipermail/containers/2008-August/thread.html#12675
and i.e. https://lists.linux-foundation.org/pipermail/containers/2008-August/012691.html
where Eric suggests starting with sorting out 'capable' with respect to
namespaces first.

-serge