[Lxc-users] notes on the /var/lib/lxc-becomes-readonly problem
Serge Hallyn
serge.hallyn at canonical.com
Thu Feb 9 23:41:29 UTC 2012
During my testing I ran back into the issue of lxc-stop marking
/var/lib/lxc read-only.
So here is the deal. When a container shuts down, it tries to remount its
/ readonly. That doesn't work if the mount is busy (i.e. a file is held
open for write). If /var/lib/lxc is on the same fs as '/', or if a second
container is running, you'll see
mount: / is busy
on the console, and /var/lib/lxc won't be set to readonly. But if you
create a new fs and mount it onto /var/lib/lxc, and start only a single
container there, then /var/lib/lxc is marked readonly after shutdown (and
the '/ is busy' message doesn't show up).
Now as Dave has several times helped us to remember, this happens because
mount --bind -o remount,ro /
sets the mount's readonly flag, but
mount -o remount,ro /
sets the superblock's readonly flag. And there is only one sb for all the
bind mounts.
This gets particularly nasty when you develop dreams of using btrfs
snapshots for containers. Because all the subvolumes will share a sb.
So - a workaround, for now, is to have /etc/init.d/lxc on the host make
sure that a file under /var/lib/lxc is always held open :)
A proper fix is possible though. Thanks again to Dave for thinking of it.
In the kernel source, at fs/namespace.c:do_remount(), there is:
if (flags & MS_BIND)
err = change_mount_flags(path->mnt, flags);
else
err = do_remount_sb(sb, flags, data, 0);
I think it would be conceptually clean to do something like:
if (flags & MS_BIND || devcgroup_write_allowed(sb))
err = change_mount_flags(path->mnt, flags);
else
err = do_remount_sb(sb, flags, data, 0);
where devcgroup_write_allowed() would be much like
security/device_cgroup:__devcgroup_inode_permission(), but using the
sb->s_dev.
The idea would be, the devices cgroup isn't letting you mount that
major:minor, so why would you be able to change an existing mount?
If someone cares to work on the proper kernel patch, please send an
email to make sure there's no duplicate effort. I don't expect to do
it this week though.
-serge
More information about the lxc-users
mailing list