[lxc-devel] Potential deadlock with lxcfs and lxc-freeze

Thu Feb 11 09:47:00 UTC 2016

Hello,

some of our users encounter a strange issue when using lxc-freeze on a container
using lxcfs. Sometimes, lxc-freeze is unable to freeze a process inside the
container that is accessing files in /proc that are provided by lxcfs. The
process(es) in question hang in FUSE's request_wait_answer(), and the associated
lxcfs process in futex_wait_queue_me (according to ps faxl).

This is quite surprising, because lxcfs is not part of the cgroup that is
frozen, and should thus not be affected by a call to lxc-freeze. A similar, but
NOT unsurprising behaviour can be observed when mounting a FUSE file system in
the container itself (e.g., create /dev/fuse and mount an sshfs inside the CT),
running find in a loop on the mounted FUSE fs in the container and trying to
lxc-freeze the container. In that case, the problem is that the kernel freezer
does not know in which order the processes would need to be frozen in order to
avoid a deadlock. I don't see how this would apply to lxcfs (running on the
host) and a process accessing it (in the container) though.

A test setup that seems to work (but takes a while to trigger):

1) Log into container and do:
$ while : ; do uptime; done

2) On host do:
$ i=0; while : ; do let i++; echo freeze $i && lxc-freeze -n NAME; echo unfreeze
&& lxc-unfreeze -n NAME; done

At some point, the output of 2 will stop, and 'ps faxl' will show something like
this:

# ps faxl |grep lxcfs
4     0  3774     1  20   0 527956  2132 futex_wait_queue_me Ssl  ?
         0:10 /usr/bin/lxcfs -f -s -o allow_other /var/lib/lxcfs/
5     0 22927  3774  20   0 380220   788 wait                S    ?
         0:00  \_ /usr/bin/lxcfs -f -s -o allow_other /var/lib/lxcfs/
1     0 22928 22927  20   0 380352   788 futex_wait_queue_me S    ?
         0:00      \_ /usr/bin/lxcfs -f -s -o allow_other /var/lib/lxcfs/

# (ps faxl portion for the container, no lxc-attach was used so this
   includes all of it)

5     0 12569     1  20   0  38768  3448 ep_poll             Ss   ?
         0:02 [lxc monitor] /var/lib/lxc 104
4     0 12651 12569  20   0  34080  4492 refrigerator        Ds   ?
         0:00  \_ /sbin/init
4     0 12815 12651  20   0  30488  5436 refrigerator        Ds   ?
         0:00      \_ /usr/lib/systemd/systemd-journald
4    81 12981 12651  20   0  34748  3444 refrigerator        Ds   ?
         0:00      \_ /usr/bin/dbus-daemon --system --address=systemd: --nofork
--nopidfile --systemd-activation
4     0 13016 12651  20   0  15292  2424 refrigerator        Ds   ?
         0:00      \_ /usr/lib/systemd/systemd-logind
4   193 13033 12651  20   0  19792  2688 refrigerator        Ds   ?
         0:00      \_ /usr/lib/systemd/systemd-networkd
4     0 13052 12651  20   0   6348  1664 refrigerator        Ds+  pts/7
     0:00      \_ /sbin/agetty --noclear --keep-baud console 115200 38400 9600
vt220
4     0 13055 12651  20   0   6348  1544 refrigerator        Ds+  pts/1
     0:00      \_ /sbin/agetty --noclear --keep-baud pts/1 115200 38400 9600
vt220
4     0 13058 12651  20   0  89728  4128 refrigerator        Ds   ?
         0:00      \_ login -- root
4     0 30296 13058  20   0  14408  3356 refrigerator        Ds   pts/0
     0:01      |   \_ -bash
0     0 22921 30296  20   0  31980  2380 request_wait_answer D+   pts/0
     0:00      |       \_ uptime
4     0 30127 12651  20   0  33752  4128 refrigerator        Ds   ?
         0:00      \_ /usr/lib/systemd/systemd --user
5     0 30159 30127  20   0  96432  1316 sigtimedwait        S    ?
         0:00          \_ (sd-pam)

Attaching gdb to the lxcfs process in question (22928 in this case) gives the
following (trimmed) backtrace:
#0  __lll_lock_wait_private () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007f9552b816db in _L_lock_11305 () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f9552b7f838 in __GI___libc_realloc (oldmem=0x7f9552ea8620
<main_arena>, bytes=bytes at entry=567) at malloc.c:3025

See [1] for full backtrace. It seems that a fork() gone wrong fails an assertion
and the malloc() needed to asprintf the error message waits for a lock? Calling
lxc-unfreeze -n NAME makes both the container and lxcfs continue without
problems, a subsequent lxc-freeze -n NAME works (since it took >6000 freeze
attempts to trigger the issue with this setup, this is not surprising).

While it takes a while to reproduce this in this test setting, our users report
that it occurs quite often in a "real" environment. Some common factors seem to
be: running multiple containers, running some kind of monitoring software
accessing various /proc files in the container (we have reports concerning
piwik, splunkd and monit). See [2] for a support forum thread with reports of
varying detail, and hopefully more backtraces soon. Note that Proxmox VE calls
lxc-freeze for both snapshot and suspend mode backups, so this issue affects
both modes.

Thanks in advance for checking this out,
Fabian

1:
https://gist.githubusercontent.com/Blub/72a7f432fcf8f6513919/raw/cbc22497abd95746dbb426b0674572c7ffef6a07/lxc-err1.txt
2: https://forum.proxmox.com/threads/lxc-backup-randomly-hangs-at-suspend.25345/