[Lxc-users] Can't start a 2nd container with 0.6.5

Sun Jan 24 19:48:55 UTC 2010

On Sun, 2010-01-24 at 13:27 -0500, Brian K. White wrote: 
> Daniel Lezcano wrote:
> > Brian K. White wrote:
> >> Brian K. White wrote:
> >>  
> >>> However, now when I go to make a 2nd container, I can't start it.
> >>> I can create it, but not execute or start.
> >>>     
> >>
> >> Well I'm more boggled now.
> >> I stopped my first container nj12.
> >> lxc-ls shows nothing, screen -ls shows nothing, mount shows nothing 
> >> extra, yet trying to start nj13 still fails, and trying to start nj12 
> >> still succeeds.
> >>
> >> I can't find anything functionally different between nj12 and nj13...
> >> What could I be missing???
> >>   
> > Yep, there is a problem with the pivot root and the umount of the 
> > different mount point in the old rootfs.
> > This problem appears with some configuration (I didn't figure out which 
> > one yet), I did a hot fix, which is more a workaround than a real fix, 
> > (I didnt't understand where is coming the real problem).
> > 
> > As soon as I find the culprit, I will release a 0.6.6 version to fix 
> > that as the 0.6.5 is bogus.
> > 
> > In the meantime, if you wish to test, I attached the patch to this email.
> > 
> > Thanks for reporting the problem.
> > 
> >  -- Daniel
> > 
> 
> Thanks. Seems to be working. Both containers start up , run , shutdown 
> ok. I'm still working on my install/setup procedures so I don't have a 
> 3rd or more containers to test yet.

> Downgrading to 0.6.3 worked also. I didn't try 0.6.4, no particular 
> reason, just 0.6.3 is what comes stock with my OS.

Oh, 0.6.4 will "work", for some value of work, just like 0.6.3 does.
Neither of them contained the problematical pivot root operation at all.
But...  They have other problems for which the introduction of the pivot
root was required.  They will have bogus mount points in
their /proc/mounts table and there is a security problem where a guest
could break out of the chrooted jail.  Both of these are fixed in 0.6.5
but it's that fix that's causing this problem.

It's very interesting that you ran into this on a second container.  I
ran into it on a system with a large number of containers (>30) but only
if I had tmpfs mounted on /dev/shm in the host.  In the process of
debugging it, I moved all but 4 of those containers to other hosts and
then strangely discovered I could no longer reproduce the problem with
or without /dev/shm mounted.  I have one other way in which I can
reproduce the problem and that's where I have the rootfs mounted through
a bind mount from another directory AND the rootfs is also listed in the
containers fstab for the same ordered pair.  That was an accidental
mistake I made in migrating a container (I've been testing both methods
as a compatibility / migration mode from OpenVZ to LXC).  Either method
alone works (and I'm settling in on using the later instead of the
former, since the former clutters the hosts mount tables horribly) but
together they reproduce this exact same problem.  I'm going to retest
with 0.6.5 both with and without the patch on that improper
configuration and see what happens.  0.6.5 actually contained fixes for
that problem under a very good explanation about WHY it happens that
explains both of my failure conditions but then it didn't seem to fix
all the cases.

> I'm running 0.6.5 for now as long as it works.
> 
> Thanks much!

-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20100124/12f1e721/attachment.pgp>