[lxc-devel] (Mount) namespaces cleanup

Mon Sep 7 15:44:48 UTC 2015

Quoting Wolfgang Bumiller (w.bumiller at proxmox.com):
> On Fri, Sep 04, 2015 at 06:09:36PM +0000, Serge Hallyn wrote:
> > > I'm assuming the cleanup is left to the kernel for when the last
> > > reference to the namespace disappears. However, this can be
> > 
> > Yes.
> > 
> > > problematic in some cases. For instance with an NFS mount, which can
> > > apparently hang indefinitely.
> > > 
> > > So I'd like to know what the recommended procedure there is. One thing
> > > that came to mind is adding a `cleanup` hook. Currently there's only a
> > > post-stop hook which is run in the host's namespace after stopping. A
> > > cleanup hook would ideally be run after stopping the processes in the
> > > container but *inside* the container's namespaces. (Maybe also a
> > > pre-stop hook to be run inside the container before killing the
> > > processes?)
> > 
> > cleanup hook could be useful.  What exactly would you do there to solve
> > your problem?
> 
> The idea is to manually unmount the storages we want to manage from the
> outside, so that if an unmount hangs we actually notice it (because the
> unmount-process would hang with it, and we'd see that as we'd be waiting
> for it to finish.)
> 
> Dead mounts aren't the only issue though, timing is, too. Here's an
> example: We have two nodes in a cluster and want to migrate a container
> from one to the other. So we stop the container, now let's assume the
> data is on shared storage, so we (up until now) assumed we can simply
> start it on the other node, however, there's no guarantee that we can
> already do this: like if the network filesystem is slow, and is thus
> still in the process of syncing, we'd be trying to mount the data on
> node B before node A is finished syncing.
> 
> In the cleanup hook we'd basically just wait for the mounts, and thus
> either delay the start of the container on the other node, or throw an
> error after a time out and prevent the start on the other node.
> 
> The other idea we had would be to keep the filesystems visible on the
> host. That way the host can make sure they're unmounted after the
> container is stopped. Since the container AFAIK is an MS_SLAVE mount
> this would propagate to the container if it's still alive for some other
> reason.

The container can easily undo the MS_SLAVE, so while this may be good
enough to be useful, it's not bullet-proof.