[lxc-devel] mounts...

Sat Nov 14 15:17:17 UTC 2009

On Sat, Nov 14, 2009 at 02:54:20PM +0300, Michael Tokarev wrote:
> Hello!
> 
> Several questions here if I can... ;)
> 
> Why mountpoints in the per-container fstab can't be relative
> to the container's rootfs?  It's trivial to implement by
> allowing non-absolute pathnames in there and chdir'ing into
> the rootfs prior to mounting.  That already works when running
> lxc-start from within the container's rootfs.

That's not so trivial, imo. I'm working on 'config variables' in this
problem in mind. This will allow you to specify something like
${lxc.rootfs}/... inside your fstab.

> I think it's the way to go really - to _require_ non-absolute
> mountpoints in the container's mount file.  Partly because it's
> not a good idea to mount to a directory which is not visible from
> within a container (but it might be useful still to grant access
> to only part of the filesystem to a given container).  And partly
> because it's just somewhat ugly.

There could be many possible situations, when you may need to mount
something inside hn when container is started. It could be, for example,
container's rootfs on some network file system, or some other resources.

Anyway - this is more flexible approach, and, imo, there is no great
need for making it less flexible. Variables should be just ok for lazy
mantaining =) (before that, you may write trivial script that utilizes
'sed' for changing values from some template file, whatever you need). 

> Second question is about the "other" mountpoints that exists on
> the host system when starting a container.  Is is a good idea to
> umount "unrelated" filesystems that are not used in a container
> but are still shown in /proc/mounts?  I mean, is there a way to
> access these from within a container somehow, bypassing the
> "container barrier"?

No. This is conceptual thing of 'isolation'. Why should you need it? If
you need some partiion mounted inside container - mount it when
starting. If you need some part of hn's filesystem - mount-bind it when
starting.

> That's just... nonsense ;)

Yes, yes. This is an kernel issue, I suppose, something about not fully
virtualized procfs. But I don't know any userspace utility that uses
/proc/mounts directly. They all are using /etc/mtab. Just clear it from
bad entries and all whould be ok.

> Where that /tmp/lxc-rC7sKKP come from?  What's the reason to
> create a separate mount to start with, why not use rootfs directly?
> I _think_ I don't understand something here and a separate mount is
> actually required to be a rootfs for a container, in a way similar
> to somewhat-fake (in a sense that on normal system it contains nothing)
> rootfs on real host system.

I don't know why it's done exactly, but if you will try to ls -l inside this
dir, you may find, that it's empty - it seems to be some kind of
side-effect of unsharing mount namespace. Probably this approach are
used to keep access to container's rootfs from hn (still there is other
way of accessing - through proc-fs, but it's much more complicated, imo).

> 
> But maybe /var/lib/lxc/rootfs is better suited for that instead of a random
> name in /tmp ?  And maybe it's a good idea to actually show whole mount tree
> (at least as long as it's not modified in a container) on a host system?

Mounts are done after 'cloning', i.e. in other namespace. They are
inaccessible from outside of container, so, following your previous
words - this is nonsense =)

> And finally, isn't it simpler to run a script (or an external command) to
> prepare the container's namespace (and do other necessary things) than to try
> to do everything from within the conffile?  I mean, instead of stuff like
> the mounting (processing mounts file or conffile entries), setting up
> cgroups(*), hostname, mounting consoles etc, there might be a place to call
> a specified shell script that does all that and other things.

For me conffile is much simplier approach to mantain, then write custom scripts.

I'll post really soon my reworked patch that adds config.include option
inside conffile, so you would able to keep all common config entries in
some common file, while container's config will contain only
container-specific options.

> (*) for cgroups, especially for devices, it's quite ugly to specify things
> by device numbers, having in mind the dynamic nature of devices nowadays.

Device numbers are changing really rare. Actually, I don't think, that device
number for /dev/null, zero or tty* was ever changed =)

> It should be easy to let things like:
>   lxc.cgroup.devices.allow = /dev/null rwm
> so that it gets translated to "c 1:3" at invocation time.  That can be done
> in a mentioned shell script just fine.

Yeah. And from what source shell script whould take values that it
should apply? You will need some config-structure anyway. For me, shell
script is something redundant. If you have some examples, why it's
better then conffile - post them, please. (Actually, most of work, that
lxc does could be done from shell script, but how fast it will be? =))