[lxc-devel] [lxc-users] Container escape through open_by_handle_at (shocker exploit)

Fri Jun 20 15:47:04 UTC 2014

On Wed, 18 Jun 2014 21:03:05 -0400
Dwight Engen <dwight.engen at oracle.com> wrote:

> On Wed, 18 Jun 2014 14:11:49 -0400
> Stéphane Graber <stgraber at ubuntu.com> wrote:
> 
> > Just fixing lxc-devel's e-mail address, it turns out that e-mails
> > work better when you don't forget the tld :)
> > 
> > So, lxc-devel subscribers, see below:
> > 
> > On Wed, Jun 18, 2014 at 01:41:19PM -0400, Stéphane Graber wrote:
> > > TL;DR: As we've said a few times already, privileged containers
> > > shouldn't be considered root safe, here's one more example of
> > > that. Please use unprivileged containers whenever possible or if
> > > you can't, don't trust anyone with root in your containers!
> > > 
> > > 
> > > Hey everyone,
> > > 
> > > I'm sure some of you saw the exploit posted at:
> > > http://stealth.openwall.net/xSports/shocker.c
> > > 
> > > This was designed to show how to escape a standard docker
> > > container (running docker 0.11) with a standard kernel. It can be
> > > adapted to apply to LXC by changing the /.dockerinit to some
> > > other valid path inside your container.
> > > 
> > > 
> > > Now as for how this affects LXC 1.0 and higher:
> > >  - The exploit doesn't work with unprivileged containers, which
> > > are the only kind of containers which you should ever give root
> > > access to users you wouldn't trust with root access to the host.
> > > In those containers, the kernel returns EPERM as expected and are
> > > therefore NOT AFFECTED.
> > > 
> > >  - The exploit will work with privileged containers if:
> > >    - There's any bind-mount setup from the partition the exploit
> > > is trying to access. That means that if you have a separate /home
> > >      partition on your host and bind-mount /home/user inside the
> > > container, this attack will only let you access files within /home
> > > of the host.
> > > 
> > >    - The open_by_handle_at syscall isn't blocked by a seccomp
> > > policy.
> > > 
> > >    - The CAP_DAC_READ_SEARCH capability wasn't dropped.
> > > 
> > > 
> > > Due to the need to use Apparmor in disconnected mode to workaround
> > > some limitations of its policies, there's currently no way for us
> > > to prevent this kind of access. However the Apparmor team has been
> > > contacted and they have work scheduled to address this kind of
> > > issue in the near future.
> > > 
> > > I haven't been able to check whether using SELinux prevents this
> > > attack.
> > > 
> > > 
> > > Recommended ways to mitigate this specific issue are:
> > >  - If at all possibles, run your workloads in unprivileged
> > > containers.
> > > 
> > >  - If using privileged containers, assume root in the container is
> > > the same as root outside of it, so avoid running tasks as root.
> > > 
> > >  - If you need to run untrusted tasks as root in the container,
> > > either use seccomp to block open_by_handle_at (make a blacklist
> > > policy file and set lxc.seccomp to its path) or add lxc.cap.drop =
> > > dac_read_search to your config.
> > > 
> > >    Note that both of those options may cause some userspace
> > > failures. In my tests I didn't spot any obvious one but that was
> > > basically just creating, starting and stopping a container.
> > > 
> > > In general, if your distribution supports it, make sure to run
> > > privileged LXC containers under AppArmor as it does prevent all
> > > the other attacks we've been made aware of so far (though we
> > > expect there are a few more we haven't heard of yet...).
> > > 
> > > The same is probably true of SELinux, however my knowledge there
> > > is pretty limited, so maybe Dwight can give a quick update on the
> > > state of things.
> 
> I've confirmed that the exploit works under SELinux too, at least
> under the virtd_lxc_t context. I agree that its best to run in
> unprivileged containers when possible.

Further testing shows that the exploit does not work under SELinux when
run under the svirt_lxc_net_t context. Also note that the exploit needs
to be modified to work on xfs (root inode 128 and 12 byte long handles).

> > > Additionally, Serge is currently working on a default seccomp
> > > profile which will block syscalls we know to be problematic in
> > > privileged containers. I'm planning on getting this changeset into
> > > the stable branch and tag 1.0.5 once we're happy with them.
> > > 
> > > Unless templates or distro maintainers oppose to it, I'd like that
> > > seccomp profile to be set by default by all templates (or for
> > > those using the new style configs, in their
> > > respective .common.conf). This would only apply to privileged
> > > containers.
> > > 
> > > -- 
> > > Stéphane Graber
> > > Ubuntu developer
> > > http://www.ubuntu.com
> > 
> > 
> > 
> > > _______________________________________________
> > > lxc-users mailing list
> > > lxc-users at lists.linuxcontainers.org
> > > http://lists.linuxcontainers.org/listinfo/lxc-users
> > 
> > 
> 
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel