[lxc-devel] [RFC PATCH 0/2] Loop device psuedo filesystem

Andy Lutomirski luto at amacapital.net
Tue May 27 22:19:15 UTC 2014


On Tue, May 27, 2014 at 2:58 PM, Seth Forshee
<seth.forshee at canonical.com> wrote:
> I'm posting these patches in response to the ongoing discussion of loop
> devices in containers at [1].
>
> The patches implement a psuedo filesystem for loop devices, which will
> allow use of loop devices in containters using standard utilities. Under
> normal use a loopfs mount will initially contain a single device node
> for loop-control which can be used to request and release loop devices.
> Any devices allocated via this node will automatically appear in that
> loopfs mount (and in devtmpfs) but not in any other loopfs mounts.
> CAP_SYS_ADMIN in the userns of the process which performed the mount is
> allowed to perform privileged loop ioctls on these devices.
>
> Alternately loopfs can be mounted with the hostmount option, intended
> for mounting /dev/loop in the host. This is the default mount for any
> devices not created via loop-control in a loopfs mount (e.g. devices
> created during driver init, devices created via /dev/loop-control, etc).
> This is only available to system-wide CAP_SYS_ADMIN.
>
> I still have some testing to do on these patches, but they work at
> minimum for simple use cases. It's possible to use an unmodified losetup
> if it's new enough to know about loop-control, with a couple of caveats:
>
>  * /dev/loop-control must be symlinked to /dev/loop/loop-control
>  * In some cases losetup attempts to use /dev/loopN when the device node
>    is at /dev/loop/N. For example, 'losetup -f disk.img' fails.
>
> Device nodes for loop partitions are not created in loopfs. These
> devices are created by the generic block layer, and the loop driver has
> no way of knowing when they are created, so some kind of hook into the
> driver will be needed to support this.

This is entertaining and a bit terrifying :)

ISTM that what you've done is to create a way for per-userns devices
to live in a special filesystem and for userns containers to
instantiate those devices by offloading all the hard work to the
kernel.

What if we generalized this?

For example, we could add a concept of ephemeral devices.  An
ephemeral device is a device that can be referenced by an inode with a
guarantee that the inode will *never* accidentally point to a
different device [1].  Then we add a concept of the userns that owns a
struct device.

To make this safe, we'll need to make sure that old host udev will not
see non-init-userns devices, ever.  This is easy enough to do, but
doing it elegantly might take some design work.

To make this useful, we'll need a way for things inside user
namespaces to create the device nodes.  I can imagine at least three
ways to make this work.

a) Allow mknod on a tmpfs created by a particular userns to succeed if
the targetting struct device is owned by that userns or a child and if
the caller is ns_capable(CAP_MKNOD).
b) Create a new filesystem that has some special ioctl or whatever to do it.
c) Have real per-user-ns devtmpfs.

Now, to get loop working in a userns, we need a way for the userns (or
the host!) to create a new loop-control device owned by that userns
and we need to tweak the loop driver to make the created loop devices
be owned by the userns.

(Note: I'm deliberately ignoring the fact that just doing this for
loop seems to be almost entirely useless right now: you still can't
mount the things.)

Thoughts?


[1]  For example, there could be a special set of device numbers that
are not reused until reboot.  Ephemeral device nodes point to these
devices by number.  Alternatively, the inodes could keep references to
the struct device.


More information about the lxc-devel mailing list