[Lxc-users] Sharing container rootfs

Mon Jun 10 04:08:44 UTC 2013

On Fri, 2013-06-07 at 08:45 +0000, Purcareata Bogdan-B43198 wrote: 
> Hello,

> I have a question regarding containers and their supporting rootfs. Is
> there an option for lxc-create that will use a default path (or other
> backing store) as rootfs?

It can be done but must be done very carefully.  No, there is not a
straight forward "out of the box" option to do this and you have to know
what you're doing and what you're getting yourself into.

> I understand that by specifying -B ... it will try to _create_ the
> rootfs in an alternate place. In my scenario, I plan to configure the
> rootfs externally - there will be a single host rootfs image which
> will be bind-mounted --make-slave for several containers. I just want
> to know if there is an option for lxc-create that will skip creating
> the rootfs and just use one as is. I didn't see any such option for
> lxc-clone either.

If I were to do this, I would create a "template" container containing
the rootfs and then copy the config file from the /var/lib/lxc directory
into another directory and make copies of it for my individual
containers (leaving the rootfs the same).  Then use lxc-create with the
config file option pointing at the new config files.  Almost backwards
of what you describe next.

Literally...

lxc-create -f config_file

Where each unique config file contains a common root lxc.rootfs value
while other values for networking and additional mounts vary.

> Assuming there isn't such an option, my setup has the following steps:
> - create a container with default options
> - delete the ${rootfs}/
> - update the config file to point to the new rootfs path - which I
> configured with bind-mounts

It's unnecessary to create a whole new container.  You can just copy and
modify the config to create a new config to feed to lxc-create.

> Does this look ok for my scenario? Is this the right setup?

Suboptimal in my mind but, yes.  But, here there be dragons (with both
methods).  If you do this, one container modifying the root fs (like
changing the root password) will change it for the other container.
Unless you have a highly specialized application for a general os
container or you are using specialized application containers, I would
advise strongly against it.  The side effects will be a nightmare to
debug and you won't get much help straying that far afield.

I use to do something similar a lot under the old linux-vservers project
(now defunct for several years - mailing list is now dead).  They used a
COW (Copy On Write) system to maintain a common READ ONLY root system
and per-vserver modified layers of changes each server made while
running.  It was quite a nice feature.

In theory, this is the idea of using a rootfs image with a unionfs rw
layer on top of that for the running container.  That way, you only have
one copy of a binary on disk and only one copy of the shared executable
code in memory, yet the containers all have unique modifiable root file
systems.  So it works in principle.  Implementation can be another
matter.

I think I recall having done this with OpenVZ (after linux-vserver
failed in ongoing IPv6 support forced me over to OpenVZ) but that also
would have been a long time ago.  More recently (but still more than a
year ago) I tried the same technique using unionfs with LXC which failed
horribly.  Functionally, it should appear to be similar to a bind mount
but bind mounts are currently problematical with some of the hacks we've
had to implement to work around systemd conventions.  I haven't tried it
in well over a year.  I suppose I should try that again.  Maybe it would
work now...

This is actually what is done in many run-live distros, CD, and bootable
USB keys with root file system overlays.  It works very nicely but is
not currently (last I looked) well integrated into the kernel.  IIRC,
unionfs should be making it's appearance in the mainline upstream kernel
as union mounts but I don't know what the current state of affairs is
there.

Another method is a variation on the old Unix paradigm used for some
"headless tube" scenarios where you have a RO / and /usr and provide
individualized rw /var and /home.  The /etc directory was specialized
and could be maintained RO though the use of netgroups and things like
yp (yellow pages) / NIS / NIS+ for some per system configuration data.
That seems to have fallen out of favor ages ago, though.  That would
take a lot of custom mounts and custom configurations for whatever it is
you want to do (which you weren't real clear on).

It can be done but you'll be out on the cutting edge.  To blindly share
a rootfs is going to be difficult to manage and debug unless you know
exactly what you are doing with a very specialized application and
environment.  I wouldn't personally attempt it without unionfs, union
mounts, or another COW method to protect on container from random acts
of terrorism in another.  I would DEFINITELY not deploy it in a
production environment until it had proven itself in a realistic test
environment first.

> Thank you,
> Bogdan P.

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20130610/53b1c191/attachment.pgp>