[lxc-users] lxc progress and a few questions

Tycho Andersen tycho.andersen at canonical.com
Mon Apr 11 15:25:31 UTC 2016


hi,

On Fri, Apr 08, 2016 at 04:40:43PM -0700, jjs - mainphrame wrote:
> Ah, never mind - it doesn't appear to be solely a criu issue - even
> migration of stopped containers hangs forever now.

Yeah, I think there is some behavior either with rsync or our
websocket libraries that's causing this hang. I've looked into it but
haven't had any luck .Try the zfs backend, that should work better.

Also, there is a bug with xenial kernels -16+ that prevent CRIU from
working (you'll get an EBUSY trying to mount sysfs), so if you have
-15 or below that should work.

Sorry for the delay, been dealing with a family emergency.

Tycho

> Jake
> 
> On Fri, Apr 8, 2016 at 4:23 PM, jjs - mainphrame <jjs at mainphrame.com> wrote:
> > Ubuntu 16.04, up to date -
> >
> > After today's updates, including a kernel upgrade to 4.4.0-18, I tried
> > live migration again:
> >
> > root at raskolnikov:~# lxc move third lxd2:
> >
> > One hour later:
> >
> > root at raskolnikov:~# lxc move third lxd2:
> >
> > Still stuck, and the migration file in /var/log/lxd/third has not been created.
> >
> > Tycho said on Mar 30 that the situation should be sorted soon, but
> > mentioned the git repo:
> > https://github.com/tych0/criu/tree/cgroup-root-mount
> >
> > Should live migration work with criu from git?
> >
> > Feel free to advise me on what information I can supply, not only for
> > the ct migration issues, but also for the new dhcp issue
> >
> > Regards,
> >
> > Jake
> >
> >
> > On Thu, Apr 7, 2016 at 11:01 PM, jjs - mainphrame <jjs at mainphrame.com> wrote:
> >> (Bump) -
> >>
> >> Any thoughts on what to try for the CT migration and dhcp issues?
> >> Running up to date ubuntu 16.04 beta -
> >>
> >> Regards,
> >>
> >> Jake
> >>
> >> On Wed, Apr 6, 2016 at 3:18 PM, jjs - mainphrame <jjs at mainphrame.com> wrote:
> >>> Greetings -
> >>>
> >>> I'be not yet been able to reproduce that one shining moment from Mar
> >>> 29 when live migration of privileged containers was working, under
> >>> kernel 4.4.0-15
> >>>
> >>> To recap. live container migration broke with 4.4.0-16, and is still
> >>> broken in 4.4.0-17 - but  now, instead of producing an error message,
> >>> an attempt to live migrate a container merely hangs forever. Is that
> >>> expected, or should I be seeing something more? BTW - the migration
> >>> dump log for that container hasn't been touched for a week. I'll be
> >>> glad to supply more info if this is not a known issue.
> >>>
> >>> Recent updates seem to have created a new problem. the CTs which
> >>> configure their own network settings work (aside from migration) but
> >>> none of the CTs which depend on dhcp are getting IPs. BTW I'm using a
> >>> bridge connected to my local network and dhcp, not the default lxc
> >>> dhcp server. I see the packets on the host bridge, but they don't
> >>> reach the dhcp server. I'd be curious to know if there have been any
> >>> dhcp issues since recent updates. If not, I'll need to troubleshoot
> >>> other causes, but it's odd that dhcp simply stops working for all CTs
> >>> on both lxd hosts after updates.
> >>>
> >>> Jake
> >>>
> >>>
> >>> On Wed, Mar 30, 2016 at 6:27 AM, Tycho Andersen
> >>> <tycho.andersen at canonical.com> wrote:
> >>>> On Tue, Mar 29, 2016 at 11:17:26PM -0700, jjs - mainphrame wrote:
> >>>>> Well, I've found some interesting things here today. I created a couple of
> >>>>> privileged xenial containers, and sure enough, I was able to live migrate
> >>>>> them back and forth between the 2 lxd hosts.
> >>>>>
> >>>>> So far, so good.
> >>>>>
> >>>>> Then I did an apt upgrade - among the changes was a kernel change from
> >>>>>  4.4.0-15 to 4.4.0-16 - and live migration stopped working.
> >>>>>
> >>>>> Here are the failure messages that resulted from attempting the very same
> >>>>> live migrations that worked before the upgrade and reboot into 4.4.0-16:
> >>>>>
> >>>>> root at raskolnikov:~# lxc move akira lxd2:
> >>>>> error: Error transferring container data: checkpoint failed:
> >>>>> (00.092234) Error (mount.c:740): mnt: 83:./sys/fs/cgroup/devices doesn't
> >>>>> have a proper root mount
> >>>>> (00.098187) Error (cr-dump.c:1600): Dumping FAILED.
> >>>>>
> >>>>>
> >>>>> root at ronnie:~# lxc move third lxd:
> >>>>> error: Error transferring container data: checkpoint failed:
> >>>>> (00.076107) Error (mount.c:740): mnt: 326:./sys/fs/cgroup/perf_event
> >>>>> doesn't have a proper root mount
> >>>>> (00.080388) Error (cr-dump.c:1600): Dumping FAILED.
> >>>>
> >>>> Yep, this is a known issue with -16. We need both a kernel patch and a
> >>>> patch to CRIU before it will start working again. I have a branch at:
> >>>>
> >>>> https://github.com/tych0/criu/tree/cgroup-root-mount
> >>>>
> >>>> which should work if you want to keep playing with it, but hopefully
> >>>> we'll have the situation sorted out in the next few days.
> >>>>
> >>>> Tycho
> >>>>
> >>>>> Jake
> >>>>>
> >>>>> PS - Thanks for the html mail heads-up - I've been using google mail
> >>>>> services for this domain. I'll have to look into the config options, and
> >>>>> see if I can do the needful.
> >>>>
> >>>>>
> >>>>> On Tue, Mar 29, 2016 at 12:45 PM, Andrey Repin <anrdaemon at yandex.ru> wrote:
> >>>>>
> >>>>> > Greetings, jjs - mainphrame!
> >>>>> >
> >>>>> > >> On Mon, Mar 28, 2016 at 08:47:24PM -0700, jjs - mainphrame wrote:
> >>>>> >  >>> I've looked at ct migration between 2 ubuntu 16.04 hosts today, and
> >>>>> > had
> >>>>> >  >>> some interesting problems;  I find that migration of stopped
> >>>>> > containers
> >>>>> >  >>> works fairly reliably; but live migration, well, it transfers a lot of
> >>>>> >  >>> data, then exits with a failure message. I can then move the same
> >>>>> >  >>> container, stopped, with no problem.
> >>>>> >  >>>
> >>>>> >  >>> The error is the same every time, a failure of "mkdtemp" -
> >>>>> > >>
> >>>>> > >>  It looks like your host /tmp isn't writable by the uid map that the
> >>>>> > >>  container is being restored as?
> >>>>> >
> >>>>> >
> >>>>> > > Which is odd, since /tmp has 1777 perms on both hosts, so I don't see how
> >>>>> > > it could be a permissions problem. Surely the default apparmor profile is
> >>>>> > > not the cause? You did give me a new idea though, and I'll set up a test
> >>>>> > > with privileged containers for comparison. Is there a switch to enable
> >>>>> > verbose logging?
> >>>>> >
> >>>>> > I've ran into the same issue once. Stumbled upon it for nearly a month,
> >>>>> > falsely
> >>>>> > blaming LXC.
> >>>>> > Recreating a container's rootfs from scratch resolved the issue.
> >>>>> > I know not of what caused it to begin with, must've been some kind of
> >>>>> > glitch.
> >>>>> >
> >>>>> > P.S.
> >>>>> > It would be great if you can configure your mail client to not use HTML
> >>>>> > format
> >>>>> > for lists.
> >>>>> >
> >>>>> >
> >>>>> > --
> >>>>> > With best regards,
> >>>>> > Andrey Repin
> >>>>> > Tuesday, March 29, 2016 22:43:04
> >>>>> >
> >>>>> > Sorry for my terrible english...
> >>>>> > _______________________________________________
> >>>>> > lxc-users mailing list
> >>>>> > lxc-users at lists.linuxcontainers.org
> >>>>> > http://lists.linuxcontainers.org/listinfo/lxc-users
> >>>>> >
> >>>>
> >>>>> _______________________________________________
> >>>>> lxc-users mailing list
> >>>>> lxc-users at lists.linuxcontainers.org
> >>>>> http://lists.linuxcontainers.org/listinfo/lxc-users
> >>>>
> >>>> _______________________________________________
> >>>> lxc-users mailing list
> >>>> lxc-users at lists.linuxcontainers.org
> >>>> http://lists.linuxcontainers.org/listinfo/lxc-users
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users


More information about the lxc-users mailing list