[lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces

Thu May 15 20:33:11 UTC 2014

Am 15.05.2014 22:26, schrieb Serge E. Hallyn:
> Quoting Richard Weinberger (richard at nod.at):
>> Am 15.05.2014 21:50, schrieb Serge Hallyn:
>>> Quoting Richard Weinberger (richard.weinberger at gmail.com):
>>>> On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman
>>>> <gregkh at linuxfoundation.org> wrote:
>>>>> Then don't use a container to build such a thing, or fix the build
>>>>> scripts to not do that :)
>>>>
>>>> I second this.
>>>> To me it looks like some folks try to (ab)use Linux containers
>>>> for purposes where KVM would much better fit in.
>>>> Please don't put more complexity into containers. They are already
>>>> horrible complex
>>>> and error prone.
>>>
>>> I, naturally, disagree :)  The only use case which is inherently not
>>> valid for containers is running a kernel.  Practically speaking there
>>> are other things which likely will never be possible, but if someone
>>> offers a way to do something in containers, "you can't do that in
>>> containers" is not an apropos response.
>>>
>>> "That abstraction is wrong" is certainly valid, as when vpids were
>>> originally proposed and rejected, resulting in the development of
>>> pid namespaces.  "We have to work out (x) first" can be valid (and
>>> I can think of examples here), assuming it's not just trying to hide
>>> behind a catch-22/chicken-egg problem.
>>>
>>> Finally, saying "containers are complex and error prone" is conflating
>>> several large suites of userspace code and many kernel features which
>>> support them.  Being more precise would, if the argument is valid,
>>> lend it a lot more weight.
>>
>> We (my company) use Linux containers since 2011 in production. First LXC, now libvirt-lxc.
>> To understand the internals better I also wrote my own userspace to create/start
>> containers. There are so many things which can hurt you badly.
>> With user namespaces we expose a really big attack surface to regular users.
>> I.e. Suddenly a user is allowed to mount filesystems.
> 
> That is currently not the case.  They can mount some virtual filesystems
> and do bind mounts, but cannot mount most real filesystems.  This keeps
> us protected (for now) from potentially unsafe superblock readers in the
> kernel.

Yeah, I meant not only "real" filesystems.
I had VFS issues in mind where an attacker could do bad things
using bind mounts for example.

>> Ask Andy, he found already lots of nasty things...
> 
> Yes, of course, and there may be more to come...
> 
>> I agree that user namespaces are the way to go, all the papering with LSM
>> over security issues is much worse.
>> But we have to make sure that we don't add too much features too fast.
> 
> Agreed.  Like I said, 'we have to work (x) out first' could be valid,
> including 'we should wait (a year?) for user ns issues to fall out
> before relaxing any of the current user ns constraints." 
> 
> On the other hand, not exercising the new code may only mean that
> existing flaws stick around longer, undetected (by most).

Fair point.

>> That said, I like containers a lot because they are cheap but as they are lightweight
>> also therefore also isolation level is lightweight.
>> IMHO containers are not a cheap replacement for KVM.
> 
> The building blocks for containers can also be used for entirely
> new, simpler use cases - i.e. perhaps a new fakeroot alternative based
> on user namespace mappings.  Which is why "this is not a use case for
> containers" is not the right way to push back, whether or not the
> feature ends up being appropriate.

Agreed.

Maybe I'm too pessimistic.
We'll see. :-)

Thanks,
//richard