[lxc-devel] (Mount) namespaces cleanup

Tue Sep 8 09:10:50 UTC 2015

> On September 7, 2015 at 5:44 PM Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> Quoting Wolfgang Bumiller (w.bumiller at proxmox.com):
> > On Fri, Sep 04, 2015 at 06:09:36PM +0000, Serge Hallyn wrote:
> > > > I'm assuming the cleanup is left to the kernel for when the last
> > > > reference to the namespace disappears. However, this can be
> > > 
> > > Yes.
> > > 
> > > > problematic in some cases. For instance with an NFS mount, which can
> > > > apparently hang indefinitely.
> > > > 
> > > > So I'd like to know what the recommended procedure there is. One thing
> > > > that came to mind is adding a `cleanup` hook. Currently there's only a
> > > > post-stop hook which is run in the host's namespace after stopping. A
> > > > cleanup hook would ideally be run after stopping the processes in the
> > > > container but *inside* the container's namespaces. (Maybe also a
> > > > pre-stop hook to be run inside the container before killing the
> > > > processes?)
> > > 
> > > cleanup hook could be useful.  What exactly would you do there to solve
> > > your problem?
> > 
> > The idea is to manually unmount the storages we want to manage from the
> > outside, so that if an unmount hangs we actually notice it (because the
> > unmount-process would hang with it, and we'd see that as we'd be waiting
> > for it to finish.)
> > 
> > Dead mounts aren't the only issue though, timing is, too. Here's an
> > example: We have two nodes in a cluster and want to migrate a container
> > from one to the other. So we stop the container, now let's assume the
> > data is on shared storage, so we (up until now) assumed we can simply
> > start it on the other node, however, there's no guarantee that we can
> > already do this: like if the network filesystem is slow, and is thus
> > still in the process of syncing, we'd be trying to mount the data on
> > node B before node A is finished syncing.
> > 
> > In the cleanup hook we'd basically just wait for the mounts, and thus
> > either delay the start of the container on the other node, or throw an
> > error after a time out and prevent the start on the other node.
> > 
> > The other idea we had would be to keep the filesystems visible on the
> > host. That way the host can make sure they're unmounted after the
> > container is stopped. Since the container AFAIK is an MS_SLAVE mount
> > this would propagate to the container if it's still alive for some other
> > reason.
> 
> The container can easily undo the MS_SLAVE, so while this may be good
> enough to be useful, it's not bullet-proof.

Yes, you can always shoot yourself in the foot ;-).
But it would at least cover the normal use-case: safely unmounting network
filesystems before starting the migrated container on another host, if possible.

I'm on vacation for the rest of this week, so I probably won't work on a patch
before next week.