[lxc-devel] [lxc/lxc] c74e92: lxc_clone: pass non-stack allocated stack to clone

Christian Brauner noreply at github.com
Wed May 29 15:14:02 UTC 2019


  Branch: refs/heads/master
  Home:   https://github.com/lxc/lxc
  Commit: c74e9217448743e9cdfc068568bf3d7c720ca21b
      https://github.com/lxc/lxc/commit/c74e9217448743e9cdfc068568bf3d7c720ca21b
  Author: Tycho Andersen <tycho at tycho.ws>
  Date:   2019-05-15 (Wed, 15 May 2019)

  Changed paths:
    M src/lxc/namespace.c

  Log Message:
  -----------
  lxc_clone: pass non-stack allocated stack to clone

There are two problems with this code:

1. The math is wrong. We allocate a char *foo[__LXC_STACK_SIZE]; which
   means it's really sizeof(char *) * __LXC_STACK_SIZE, instead of just
   __LXC_STACK SIZE.

2. We can't actually allocate it on our stack. When we use CLONE_VM (which
   we do in the shared ns case) that means that the new thread is just
   running one page lower on the stack, but anything that allocates a page
   on the stack may clobber data. This is a pretty short race window since
   we just do the shared ns stuff and then do a clone without CLONE_VM.

However, it does point out an interesting possible privilege escalation if
things aren't configured correctly: do_share_ns() sets up namespaces while
it shares the address space of the task that spawned it; once it enters the
pid ns of the thing it's sharing with, the thing it's sharing with can
ptrace it and write stuff into the host's address space. Since the function
that does the clone() is lxc_spawn(), it has a struct cgroup_ops* on the
stack, which itself has function pointers called later in the function, so
it's possible to allocate shellcode in the address space of the host and
run it fairly easily.

ASLR doesn't mitigate this since we know exactly the stack offsets; however
this patch has the kernel allocate a new stack, which will help. Of course,
the attacker could just check /proc/pid/maps to find the location of the
stack, but they'd still have to guess where to write stuff in.

The thing that does prevent this is the default configuration of apparmor.
Since the apparmor profile is set in the second clone, and apparmor
prevents ptracing things under a different profile, attackers confined by
apparmor can't do this. However, if users are using a custom configuration
with shared namespaces, care must be taken to avoid this race.

Shared namespaces aren't widely used now, so perhaps this isn't a problem,
but with the advent of crio-lxc for k8s, this functionality will be used
more.

Signed-off-by: Tycho Andersen <tycho at tycho.ws>


  Commit: 8de90384363fe01f5258d36724dd3eae55918b5b
      https://github.com/lxc/lxc/commit/8de90384363fe01f5258d36724dd3eae55918b5b
  Author: Tycho Andersen <tycho at tycho.ws>
  Date:   2019-05-15 (Wed, 15 May 2019)

  Changed paths:
    M doc/lxc.container.conf.sgml.in

  Log Message:
  -----------
  doc: add a little note about shared ns + LSMs

We should add a little not about the race in the previous patch.

Signed-off-by: Tycho Andersen <tycho at tycho.ws>


  Commit: 5e7b4b3c166873030d51dc725907351f19d7e0fd
      https://github.com/lxc/lxc/commit/5e7b4b3c166873030d51dc725907351f19d7e0fd
  Author: Tycho Andersen <tycho at tycho.ws>
  Date:   2019-05-15 (Wed, 15 May 2019)

  Changed paths:
    M src/lxc/namespace.c

  Log Message:
  -----------
  lxc_clone: get rid of some indirection

We have a do_clone(), which just calls a void f(void *) that it gets
passed. We build up a struct consisting of two args that are just the
actual arg and actual function. Let's just have the syscall do this for us.

Signed-off-by: Tycho Andersen <tycho at tycho.ws>


  Commit: 3df90604ec03a67791b26c94aa5592d127cb0914
      https://github.com/lxc/lxc/commit/3df90604ec03a67791b26c94aa5592d127cb0914
  Author: Tycho Andersen <tycho at tycho.ws>
  Date:   2019-05-29 (Wed, 29 May 2019)

  Changed paths:
    M src/lxc/namespace.c

  Log Message:
  -----------
  lxc_clone: bump stack size to 8MB

This is the default thread size for glibc, so it is reasonable to match
that when we clone().

Mostly this is a science experiment suggested by brauner, and who doesn't
love science?

Signed-off-by: Tycho Andersen <tycho at tycho.ws>


  Commit: 18a405ee88419e0799cf8849f1ad468c859615ba
      https://github.com/lxc/lxc/commit/18a405ee88419e0799cf8849f1ad468c859615ba
  Author: Christian Brauner <christian.brauner at ubuntu.com>
  Date:   2019-05-29 (Wed, 29 May 2019)

  Changed paths:
    M doc/lxc.container.conf.sgml.in
    M src/lxc/namespace.c

  Log Message:
  -----------
  Merge pull request #2987 from tych0/pass-zero-to-clone

Pass zero to clone


Compare: https://github.com/lxc/lxc/compare/0cfec4f757b5...18a405ee8841


More information about the lxc-devel mailing list