[lxc-devel] (Mount) namespaces cleanup

Tue Sep 8 14:15:10 UTC 2015

Quoting Wolfgang Bumiller (w.bumiller at proxmox.com):
> 
> 
> > On September 7, 2015 at 5:44 PM Serge Hallyn <serge.hallyn at ubuntu.com> wrote:
> > Quoting Wolfgang Bumiller (w.bumiller at proxmox.com):
> > > On Fri, Sep 04, 2015 at 06:09:36PM +0000, Serge Hallyn wrote:
> > > > > I'm assuming the cleanup is left to the kernel for when the last
> > > > > reference to the namespace disappears. However, this can be
> > > > 
> > > > Yes.
> > > > 
> > > > > problematic in some cases. For instance with an NFS mount, which can
> > > > > apparently hang indefinitely.
> > > > > 
> > > > > So I'd like to know what the recommended procedure there is. One thing
> > > > > that came to mind is adding a `cleanup` hook. Currently there's only a
> > > > > post-stop hook which is run in the host's namespace after stopping. A
> > > > > cleanup hook would ideally be run after stopping the processes in the
> > > > > container but *inside* the container's namespaces. (Maybe also a
> > > > > pre-stop hook to be run inside the container before killing the
> > > > > processes?)
> > > > 
> > > > cleanup hook could be useful.  What exactly would you do there to solve
> > > > your problem?
> > > 
> > > The idea is to manually unmount the storages we want to manage from the
> > > outside, so that if an unmount hangs we actually notice it (because the
> > > unmount-process would hang with it, and we'd see that as we'd be waiting
> > > for it to finish.)
> > > 
> > > Dead mounts aren't the only issue though, timing is, too. Here's an
> > > example: We have two nodes in a cluster and want to migrate a container
> > > from one to the other. So we stop the container, now let's assume the
> > > data is on shared storage, so we (up until now) assumed we can simply
> > > start it on the other node, however, there's no guarantee that we can
> > > already do this: like if the network filesystem is slow, and is thus
> > > still in the process of syncing, we'd be trying to mount the data on
> > > node B before node A is finished syncing.
> > > 
> > > In the cleanup hook we'd basically just wait for the mounts, and thus
> > > either delay the start of the container on the other node, or throw an
> > > error after a time out and prevent the start on the other node.
> > > 
> > > The other idea we had would be to keep the filesystems visible on the
> > > host. That way the host can make sure they're unmounted after the
> > > container is stopped. Since the container AFAIK is an MS_SLAVE mount
> > > this would propagate to the container if it's still alive for some other
> > > reason.
> > 
> > The container can easily undo the MS_SLAVE, so while this may be good
> > enough to be useful, it's not bullet-proof.
> 
> Yes, you can always shoot yourself in the foot ;-).
> But it would at least cover the normal use-case: safely unmounting network
> filesystems before starting the migrated container on another host, if possible.
> 
> I'm on vacation for the rest of this week, so I probably won't work on a patch
> before next week.

Ok, thank you very much.  BTW, if there is an easy way to test this with say
a tiny custom-made fusefs (or maybe there's an easier but a purely local test)
that'd be great.

-serge