[lxc-devel] [lxc/lxc] 8deca6: start: intelligently use clone() on ns sharing

GitHub noreply at github.com
Mon Dec 11 19:52:07 UTC 2017


  Branch: refs/heads/master
  Home:   https://github.com/lxc/lxc
  Commit: 8deca6c986f1aaab084ab5bb6692c8a29068bba1
      https://github.com/lxc/lxc/commit/8deca6c986f1aaab084ab5bb6692c8a29068bba1
  Author: Christian Brauner <christian.brauner at ubuntu.com>
  Date:   2017-12-11 (Mon, 11 Dec 2017)

  Changed paths:
    M src/lxc/start.c
    M src/lxc/start.h

  Log Message:
  -----------
  start: intelligently use clone() on ns sharing

When I first solved this problem I went for a fork() + setns() + clone() model.
This works fine but has unnecessary overhead for a couple of reasons:

- doing a full fork() including copying file descriptor table and virtual
  memory
- using pipes to retrieve the pid of the second child (the actual container
  process)

This can all be avoided by being a little smart in how we employ the clone()
syscall:

- using CLONE_VM will let us get rid of using pipes since we can simply write
  to the handler because we share the memory with our parent
- using CLONE_VFORK will also let us get rid of using pipes since the execution
  of the parent is suspended until the child returns
- using CLONE_VM will not cause virtual memory to be copied
- using CLONE_FILES will not cause the file descriptor table to be copied

Note that the intermediate clone() is used with CLONE_VM. Some glibc versions
used to reset the pid/tid to -1 when CLONE_VM was used without CLONE_THREAD.
But since the memory between parent and child is shared on CLONE_VM this would
invalidate the getpid() cache that glibc used to maintain and so getpid() in
the child would return the parent's pid. This is all fixed in newer glibc
versions where the getpid() cache is removed and the pid/tid is not reset
anymore. However, if for whatever reason you - dear commiter - somehow need to
get the pid of the dummy intermediate process for do_share_ns() you need to
call syscall(__NR_getpid) directly. The next lxc_clone() call does not employ
CLONE_VM and will be fine.

Signed-off-by: Christian Brauner <christian.brauner at ubuntu.com>


  Commit: 7acb5ce30ddfa114e7fee62aaa5dfe84b8df1071
      https://github.com/lxc/lxc/commit/7acb5ce30ddfa114e7fee62aaa5dfe84b8df1071
  Author: Christian Brauner <christian.brauner at ubuntu.com>
  Date:   2017-12-11 (Mon, 11 Dec 2017)

  Changed paths:
    M src/tests/Makefile.am
    A src/tests/share_ns.c

  Log Message:
  -----------
  tests: add namespace sharing tests

This also ensures that the new more efficient clone() way of sharing namespaces
is tested.

Signed-off-by: Christian Brauner <christian.brauner at ubuntu.com>


  Commit: f449521ce675ba1b5c9a8a8ffc559016844dcec6
      https://github.com/lxc/lxc/commit/f449521ce675ba1b5c9a8a8ffc559016844dcec6
  Author: Serge Hallyn <serge at hallyn.com>
  Date:   2017-12-11 (Mon, 11 Dec 2017)

  Changed paths:
    M src/lxc/start.c
    M src/lxc/start.h
    M src/tests/Makefile.am
    A src/tests/share_ns.c

  Log Message:
  -----------
  Merge pull request #2020 from brauner/2017-12-11/clone

start: intelligently use clone() on ns sharing


Compare: https://github.com/lxc/lxc/compare/e409b214020b...f449521ce675


More information about the lxc-devel mailing list