[lxc-devel] mounts...

Sat Nov 14 11:54:20 UTC 2009

Hello!

Several questions here if I can... ;)

Why mountpoints in the per-container fstab can't be relative
to the container's rootfs?  It's trivial to implement by
allowing non-absolute pathnames in there and chdir'ing into
the rootfs prior to mounting.  That already works when running
lxc-start from within the container's rootfs.

I think it's the way to go really - to _require_ non-absolute
mountpoints in the container's mount file.  Partly because it's
not a good idea to mount to a directory which is not visible from
within a container (but it might be useful still to grant access
to only part of the filesystem to a given container).  And partly
because it's just somewhat ugly.

Second question is about the "other" mountpoints that exists on
the host system when starting a container.  Is is a good idea to
umount "unrelated" filesystems that are not used in a container
but are still shown in /proc/mounts?  I mean, is there a way to
access these from within a container somehow, bypassing the
"container barrier"?

The whole mount tree in a container is quite ugly too.  Just try
to run df(1) from a container to see why.  Ideally the whole namespace
tree should be cleared to remove all the unrelated mounts, but it
isn't quite possible as long as some ttys or console is mounted from
host's /dev or as long as we're using /var/lib/lxc/$name as a starting
point and as long as the container's rootfs itself is not on a dedicated
partition.  But seriously, being able to run df(1) normally inside a
container is - IMHO - worth some efforts.  Here's an example from my
test system:

Filesystem           1K-blocks      Used Available Use% Mounted on
rootfs                67076224  17666768  49409456  27% /
/dev/md1              67076224  17666768  49409456  27% /
tmpfs                 67076224  17666768  49409456  27% /lib/init/rw
sysfs                 67076224  17666768  49409456  27% /sys
devfs                     1024         0      1024   0% /dev
tmpfs                     1024         0      1024   0% /dev/shm
cgroup                    1024         0      1024   0% /dev/cgroup
/dev/md2              67076224  17666768  49409456  27% /usr
/dev/md3              67076224  17666768  49409456  27% /var
varrun                67076224  17666768  49409456  27% /var/run
varloc                67076224  17666768  49409456  27% /var/lock
df: `/guest': No such file or directory
df: `/stage': No such file or directory
/tmp                  67076224  17666768  49409456  27% /tmp
df: `/guest/lenny/t0/dev': No such file or directory
df: `/guest/lenny/t0/dev/console': No such file or directory
df: `/guest/lenny/t0/dev/tty1': No such file or directory
df: `/guest/lenny/t0/dev/tty2': No such file or directory
df: `/guest/lenny/t0/dev/tty3': No such file or directory
df: `/guest/lenny/t0/dev/tty4': No such file or directory
/dev/md8p11           67076224  17666768  49409456  27% /
devfs                     1024         0      1024   0% /dev

That's just... nonsense ;)

Another question is about how a container looks like from within a
host system.  For example, here's an lsof(1) output for a pid1 in
a container above (bash):

COMMAND   PID USER   FD   TYPE DEVICE   SIZE      NODE NAME
bash    32164 root  cwd    DIR 259,12   4096      2061 /tmp/lxc-rC7sKKP
bash    32164 root  rtd    DIR 259,12   4096      2061 /tmp/lxc-rC7sKKP
bash    32164 root  txt    REG 259,12 700492 268582935 /tmp/lxc-rC7sKKP/bin/bash
bash    32164 root  mem    REG 259,12           245904 /tmp/lxc-rC7sKKP/lib/libnss_files-2.7.so (stat: No such file or directory)
bash    32164 root  mem    REG 259,12           245906 /tmp/lxc-rC7sKKP/lib/libnss_nis-2.7.so (stat: No such file or directory)
bash    32164 root  mem    REG 259,12           245901 /tmp/lxc-rC7sKKP/lib/libnsl-2.7.so (stat: No such file or directory)
bash    32164 root  mem    REG 259,12           245902 /tmp/lxc-rC7sKKP/lib/libnss_compat-2.7.so (stat: No such file or directory)
bash    32164 root  mem    REG 259,12           245895 /tmp/lxc-rC7sKKP/lib/libc-2.7.so (stat: No such file or directory)
bash    32164 root  mem    REG 259,12           245898 /tmp/lxc-rC7sKKP/lib/libdl-2.7.so (stat: No such file or directory)
bash    32164 root  mem    REG 259,12          2924566 /tmp/lxc-rC7sKKP/lib/libncurses.so.5.7 (stat: No such file or directory)
bash    32164 root  mem    REG 259,12           245892 /tmp/lxc-rC7sKKP/lib/ld-2.7.so (stat: No such file or directory)
bash    32164 root    0u   CHR  136,8               11 /dev/pts/8
bash    32164 root    1u   CHR  136,8               11 /dev/pts/8
bash    32164 root    2u   CHR  136,8               11 /dev/pts/8
bash    32164 root  255u   CHR  136,8               11 /dev/pts/8

Where that /tmp/lxc-rC7sKKP come from?  What's the reason to
create a separate mount to start with, why not use rootfs directly?
I _think_ I don't understand something here and a separate mount is
actually required to be a rootfs for a container, in a way similar
to somewhat-fake (in a sense that on normal system it contains nothing)
rootfs on real host system.

But maybe /var/lib/lxc/rootfs is better suited for that instead of a random
name in /tmp ?  And maybe it's a good idea to actually show whole mount tree
(at least as long as it's not modified in a container) on a host system?

And finally, isn't it simpler to run a script (or an external command) to
prepare the container's namespace (and do other necessary things) than to try
to do everything from within the conffile?  I mean, instead of stuff like
the mounting (processing mounts file or conffile entries), setting up
cgroups(*), hostname, mounting consoles etc, there might be a place to call
a specified shell script that does all that and other things.

(*) for cgroups, especially for devices, it's quite ugly to specify things
by device numbers, having in mind the dynamic nature of devices nowadays.
It should be easy to let things like:
  lxc.cgroup.devices.allow = /dev/null rwm
so that it gets translated to "c 1:3" at invocation time.  That can be done
in a mentioned shell script just fine.

Thanks!

/mjt