[lxc-devel] Potential deadlock with lxcfs and lxc-freeze

Fabian Grünbichler f.gruenbichler at proxmox.com
Fri Feb 12 07:42:14 UTC 2016


> Fabian Grünbichler <f.gruenbichler at proxmox.com> hat am 11. Februar 2016 um
> 10:47 geschrieben:
> Hello,
> 
> some of our users encounter a strange issue when using lxc-freeze on a
> container
> using lxcfs. Sometimes, lxc-freeze is unable to freeze a process inside the
> container that is accessing files in /proc that are provided by lxcfs. The
> process(es) in question hang in FUSE's request_wait_answer(), and the
> associated
> lxcfs process in futex_wait_queue_me (according to ps faxl).
> 
> This is quite surprising, because lxcfs is not part of the cgroup that is
> frozen, and should thus not be affected by a call to lxc-freeze. A similar,
> but
> NOT unsurprising behaviour can be observed when mounting a FUSE file system in
> the container itself (e.g., create /dev/fuse and mount an sshfs inside the
> CT),
> running find in a loop on the mounted FUSE fs in the container and trying to
> lxc-freeze the container. In that case, the problem is that the kernel freezer
> does not know in which order the processes would need to be frozen in order to
> avoid a deadlock. I don't see how this would apply to lxcfs (running on the
> host) and a process accessing it (in the container) though.

Hello again,

tweaked our test setup a little bit and managed to trigger it again (using
uptime).

I patched lxc-freeze to sleep for 0.1 seconds instead of 1 second, in order to
allow for more freeze attempts in a shorter period of time, and ran multiple
uptime process in parallel in the container to increase the /proc/uptime
accesses. After > 22k attempts, two uptime processes became unfreezable on the
same lxc-freeze attempt, again waiting in request_wait_answer. The associated
lxcfs processes showed the same backtraces (minus memory addresses and socket
fds) like before.

Running lxc-unfreeze on the container worked, but the waiting processes (2x
uptime, 2x lxcfs) continued to wait (with no change whatsoever in the
backtraces, gdb was detached before unfreezing and reattached afterwards). Only
rebooting the container seems to help.

Detailled ps output and backtraces at
https://gist.github.com/anonymous/c8c65043af505c6813b4

I'll try to reproduce the issue with something other than (/proc/)uptime.

Regards,
Fabian



More information about the lxc-devel mailing list