[lxc-devel] [Not A Patch] [POC] Proof of concept code for using devtmpfs for autodev and more...

Serge Hallyn serge.hallyn at ubuntu.com
Fri Nov 1 15:16:08 UTC 2013


Quoting Michael H. Warfield (mhw at WittsEnd.com):
> On Thu, 2013-10-31 at 13:00 -0500, Serge Hallyn wrote: 
> > Quoting Michael H. Warfield (mhw at WittsEnd.com):
> > > I did incorporate your suggestion of using the hash of the rootfs path
> > > as the subdirectory under the hosts /dev/ for the container.  I also
> 
> > (Printed this out to look it over, just putting all my comments together
> > here) :
> 
> > 1. I think if /dev is not devtmpfs, we should just bail on this.
> 
> Sort of concur and I think I even made a remark in some of the code
> about checking for that.
> 
> I'm of two schools of thought here.
> 
> 1) Mount our own instance of devtmpfs in a private area under our
> control.

My problem with this is that the devtmpfs mount is something of which
there can only be one instance, and I don't think lxc should be
usurping that from some potential other (certainly fugly, but it's their
machine) use.

Also, AIUI the main motivation for this is to have udev rules
eventually know how to forward devices into containers?  That won't
be happening in this case.  Well, I guess it will still give you
the persistent devices you want.

Anyway - there are the things I'm considering, but you're the one
experimenting so do what you feel will be most useful :)

> 2) Bail entirely.  This would be a fall back, in any case, if we didn't
> have devtmpfs available to us (is that possible with modern kernels?).

Yes, CONFIG_DEVTMPFS still exists and doesn't appear to get
automatically set, so it can be turned off.

> > 2. You say in comments that you're using the cgroup name, but it seems
> >    you're actually just using the container name?
> 
> I thought I was.  Maybe I misunderstood...
> 
> > 3. The cgroup name used to be unique, but now each mounted cgroupfs
> >    can actually have a different name for the same container (if some
> >    of them didn't get cleaned out well).
> 
> Ok...  One of my problems in that particular area of code is knowing
> where to get at some things that are not in the lxc_conf.  I thought the
> "name" parameter was the cgroup name but apparently not.  I could use
> some guidance there.
> 
> > I'm just thinking out loud here, so this may not be better, but how
> > about
> 
> > 1. create /dev/.lxc as you're doing
> 
> > 2. (if container is going to use this) create /dev/.lxc/$nonce.
> >    We can use hash("$lxcpath/$lxcname"), or just mkstemp(), or
> >    just an increasing integer.
> 
> Well, I was ok with what you said about using the hash of the rootfs
> real path, which is what corresponded to what you had in container.c for

Ok - I'm good with that.

> the monitor socket.  All things being equal, I'd like to stick with
> that.  As a convention, I also like sticking in a symlink for the
> container name pointing at that hash name.  That has some advantages for
> diagnostic purposes to poke around in the containers /dev without having
> to go through headstands figuring out where it is.
> 
> > 3. Create $lxcpath/$lxcname/.dev (if the container needs it) and
> >    shared-bind-mount /dev/.lxc/$nonce onto it.  Now we can tell
> >    which /dev/.lxc/* is mounted by looking at the mount table.
> 
> Hmmm...  Ok...  I think I see where you're going with that.  I'll have
> to think on that one.

Weeell, I guess it's not necessary if you use
/dev/.lxc/$(sha1sum "$lxcpath/$lxcname").

> > 4. slave-bind-bind mount $lxcpath/$lxcname/.dev into the starting
> >    container's /dev.
> 
> > Not sure whether we should have lxc.autodev = 2 mean use this scheme,
> > but I'd be fine with basically always doing this so long as /dev/ is
> > devtmpfs and lxc.autodev is set for the container.  (So making
> > the container's /dev a tmpfs would just be a fallback).
> 
> > Thoughts?
> 
> Definitely 3 and 4 are worth doing.  I'm not so sure about 2.  Since
> we're already using the hash of the rootfs path for the monitor socket,
> I don't see a problem keeping that here, at least for now.  But there is
> the little details of having that hashing code in two source files now.
> Should that be moved to a common source file?
> 
> I do have one other niggle, and I'm surprised you didn't ding me on that
> (since you expressed concerns earlier).  The automatic autodev detection

I didn't look closely enough :)  But if we can make this good enough,
then perhaps it'll be ok to make it the default behavior whenever
devtmpfs is available.  (In that case, using a single tmpfs mounted
onto /dev/.lxc if /dev is not devtmpfs may be the best backup solution).
If we do that, we'll need to consider what to do about templates that
want to create specific devices.

Right now I'm feeling like I'd rather go whole hog after your solution
rather than have 30 separate possible cases for /dev setup.  Yours is
also the only design with a possibility for user-space solution to the
devices namespace problem.  That's worth pursuing.

> is in there.  I did see in at least one other spot where we detect a
> potentially hazardous condition and bail.  So there's some reasonable
> precedence for some safety checking.
> 
> Someone in another threat suggested checking the symlinks for anything
> with "systemd" and autosetting autodev if detected.  It's better than
> what I have now but I'm concerned about just looking at /sbin/init,
> especially in the case where a process is specified on the command line.
> I'd like to implement it that way but I don't see where I have the
> "command to execute" available to me.  Is it in the config structure
> somewhere and I'm just blind?
> 
> I did see that my autodev detect logic was triggering on Ubuntu.  Seems
> that Ubuntu has systemd available but not enabled?  IAC, all my Ubuntu

LOL!  If you have a few hours or days, there are "a few" emails on
debian-devel relevant to that :)

> containers are perfectly happy with autodev enabled, even if upstart
> doesn't need it.
> 
> I'll work in this over the next couple of days and do this additional
> redirection.

Great, thanks.

-serge




More information about the lxc-devel mailing list