[lxc-devel] read-only container root

Mon Feb 15 21:15:56 UTC 2010

Michael Tokarev wrote:
> lxc-start: No such file or directory - failed to mount a new instance of '/dev/pts'
> I'm experimenting with a read-only root fs in the container.
> So far it does not work.
>
> First of all, when trying to start a container in a read-only root
> lxc-start complains:
>   lxc-start: Read-only file system - can't make temporary mountpoint
>
> This is in conf.c:setup_rootfs_pivot_root() function.  That function
> uses optional parameter "lxc.pivotdir", or creates (and later removes)
> a temporary directory for pivot_root.  Obviously there's no way to
> create a directory in a read-only filesystem.
>   
Why do you need to use a read-only root fs ?

> But lxc.pivotdir does not work either. In the function mentioned above
> it is used with leading dot (eg. if I specify "lxc.pivotdir=pivot" in
> the config file the pivot_root() syscall will be made to ".pivot" with
> leading dot, not to "pivot"), but later on it is used without that dot,
> and fails:
>
>   lxc-start: No such file or directory - failed to open /pivot/proc/mounts
>   lxc-start: No such file or directory - failed to read or parse mount list '/pivot/proc/mounts'
>   lxc-start: failed to pivot_root to '/stage/t'
>
> (that's with "lxc.pivotdir = pivot" in the config file).  After symlinking
> pivot to .pivot it still fails:
>
>   lxc-start: Device or resource busy - could not unmount old rootfs
>   lxc-start: failed to pivot_root to '/stage/t'
>   
It's a bug introduced with the pivot_root feature. Investigation on the way.

> Ok, so far so "good".
>
> Next thing is the /dev directory.  I prefer to have it in a tmpfs, because
> of several reasons (one is that the root is mounted with -o nodev), but that
> fails too unless the directory is pre-populated:
>
>   lxc-start: No such file or directory - failed to mount a new instance of '/dev/pts'
>   lxc-start: failed to setup the new pts instance
>
> That's when specifying:
>
>    lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755
>
> in the config file.  That creates an empty directory for container's /dev,
> which is populated later in the startup script.
>
> Similar thing happens when I pre-create dev/pts - it fails to bind-mount
> tty1..tty4.
>   
Ok, so your need is to call a script between:

lxc.mount.entry = /dev dev tmpfs noexec,nosuid,mode=0755

...
lxc.tty = 4

where the script will populate /dev, right ?

mmh, not obvious.

> So far it works by using a wrapper around lxc-start which mounts tmpfs
> over dev, fills it with a bunch of standard entries, and executes lxc-start.
>
> But this is really getting quite ugly.  And the only solution to all this
> mess is to let to perform the setup from a shell script/command which is
> called after "forking" the (filesystem) namespace but before entering the
> container "for real", or _instead_ of entering the container.  As was
> discussed previously.
>   

What about the lxc.script configuration line which calls a script at the 
point it is in the configuration file ?

> The whole mess started when I realized that bind-mounting host's /dev
> works perfectly _except_ the syslogging, -- /dev/log does not work with
> multiple containers, only the container where syslogd (re)started last
> works, all the rest gives "ECONNREFUSED" when trying to send any message
> to /dev/log.
>   
 /dev/log is an af_unix socket, the network is isolated, the af_unix 
belongs to the network namespace.
It's probable /dev/log is unlinked, created again and binded by syslogd. 
So as /dev/ is shared between the containers, the last one get the socket.
Any process outside of the container trying to access this socket won't 
be able.