[lxc-devel] (Mount) namespaces cleanup

Tue Sep 1 07:59:25 UTC 2015

I can't seem to find much about the cleanup process of the mount
namespace. And in fact, when I start a container, open
/proc/$container/ns/mnt with another shell on the host, then stop the
container (up to the point where lxc-info shows STOPPED), then enter
the namespace via setns(2) I can still see all the mounted
filesystems.
(In case you're wondering: `exec 3</proc/$thepid/ns/mnt` before
lxc-stop, waiting for it to be stopped, and then running `tons
/dev/fd/3 /bin/bash` with `tons` being the example code from the
setns(2) manpage ;-) )

I'm assuming the cleanup is left to the kernel for when the last
reference to the namespace disappears. However, this can be
problematic in some cases. For instance with an NFS mount, which can
apparently hang indefinitely.

So I'd like to know what the recommended procedure there is. One thing
that came to mind is adding a `cleanup` hook. Currently there's only a
post-stop hook which is run in the host's namespace after stopping. A
cleanup hook would ideally be run after stopping the processes in the
container but *inside* the container's namespaces. (Maybe also a
pre-stop hook to be run inside the container before killing the
processes?)
The point is to be able to at least propagate an error to the user and
to give automation scripts the possibility to deal with this however
they choose, because right now lxc-stop would succeed, and the dead
mountpoint would be "hidden" from the host. Think about an offline
migration (not even online with CRIU) where only one host loses
connectivity to the NFS share in the middle of a write, now the
container gets started on the other host with access to broken files.
This might be expected, but I'd still like to be able to catch this case
when it's part of an automated procedure.