[lxc-devel] Detecting if you are running in a container

Tue Oct 11 12:53:46 UTC 2011

On Oct 11, 2011, at 2:42 AM, Eric W. Biederman wrote:

> I am totally in favor of not starting the entire world.  But just
> like I find it convienient to loopback mount an iso image to see
> what is on a disk image.  It would be handy to be able to just
> download a distro image and play with it, without doing anything
> special.

Agreed, but what's wrong with firing up KVM to play with a distro image?  Personally, I don't consider that "doing something special".

> 
>> Things should just work, except that
>> processes in one container can't use more than their fair share (as
>> dictated by policy) of memory, CPU, networking, and I/O bandwidth.
> 
> You have to be careful with the limiters.  The fundamental reason
> why containers are more efficient than hardware virtualization is
> that with containers we can do over commit of resources, especially
> memory.  I keep seeing implementations of resource limiters that want
> to do things in a heavy handed way that break resource over commit.

Oh, sure.   Resource limiting is something that should be done only when there are other demands on the resource in question.   Put another way, it should be considered more of a resource guarantee than a resource limit.   (You will have at least 10% of the CPU, not at most 10% of the CPU.)

> 
> I don't know what concern you have security wise, but the problem that
> wants to be solved with user namespaces is something you hit much
> earlier than when you worry about sharing a kernel between mutually
> distrusting users.  Right now root inside a container is root rout
> outside of a container just like in a chroot jail.  Where this becomes a
> problem is that people change things like like
> /proc/sys/kernel/print-fatal-signals expecting it to be a setting local
> to their sand box when in fact the global setting and things start
> behaving weirdly for other users.  Running sysctl -a during bootup 
> has that problem in spades.

The moment you start caring about global sysctl settings is the moment I start wondering whether or not VM and separate kernel images is the better solution.   Do we really want to add so much complexity that we are multiplexing different sysctl settings across containers?   To my mind, that way lies madness, and in some cases, it simply can't be done from a semantics perspective.

> 
> With my sysadmin hat on I would not want to touch two untrusting groups
> of users on the same machine.  Because of the probability there is at
> least one security hole that can be found and exploited to allow
> privilege escalation.
> 
> With my kernel developer hat on I can't just say surrender to the
> idea that there will in fact be a privilege escalation bug that
> is easy to exploit.  The code has to be built and designed so that
> privilege escalation is difficult.  Otherwise we might as well
> assume if you visit a website an stealthy worm has taken over your
> computer.

Oh, I agree that we should try to stop privilege escalation attacks.  And it will be a grand and glorious fight, like Leonidas and his 300 men at the pass at Thermopylae.   :-)   Or it will be like Steve Jobs struggling against cancer.  It's a fight that you know that you're going to lose, but it's not about winning or losing but how much you accomplish and how you fight that counts.

Personally, though, if the issue is worries about visiting a website, the primary protection against that has got to be done  at the browser level (i.e., the process level sandboxing done by Chrome).

-- Ted