[lxc-users] LXD Live Migration

Saint Michael venefax at gmail.com
Mon Nov 9 15:37:42 UTC 2015


I must assume that LXC is not ready for production yet. Am I wrong?

On Mon, Nov 9, 2015 at 9:54 AM, Tycho Andersen <tycho.andersen at canonical.com
> wrote:

> On Fri, Nov 06, 2015 at 08:43:33AM +0000, Jamie Brown wrote:
> > I’ve just discovered a new failure on a different container too;
> >
> > # lxc move host2:nexus host1:nexus
> > error: Error transferring container data: checkpoint failed:
> > (00.355457) Error (files-reg.c:422): Can't dump ghost file
> /usr/local/sonatype-work/nexus/tmp/jar_cache5838699621686145685.tmp of
> 1177738 size, increase limit
> > (00.355477) Error (cr-dump.c:1255): Dump files (pid: 22072) failed with
> -1
> > (00.357100) Error (cr-dump.c:1617): Dumping FAILED.
>
> So this is actually an error because a default limit in criu is not
> high enough. You can set this via the --ghost-limit in criu, but LXC
> currently exposes no way to set this, although I'm hoping to add a new
> API call to allow people to set stuff like this in the near future.
>
> Thanks,
>
> Tycho
>
> >
> >
> >
> > On 06/11/2015, 08:40, "lxc-users on behalf of Jamie Brown" <
> lxc-users-bounces at lists.linuxcontainers.org on behalf of
> Jamie.Brown at mpec.co.uk> wrote:
> >
> > >Tycho,
> > >
> > >Thanks for your help.
> > >
> > >The kernels were in fact different versions, though I’m not sure how I
> got into that state! So they’re now both running 3.19.0.
> > >
> > >Now, I at least receive the same error when migrating in both
> directions;
> > ># lxc move host2:test host1:test2
> > >error: Error transferring container data: restore failed:
> > >(00.008103)      1: Error (mount.c:2030): Can't mount at
> ./dev/.lxd-mounts: No such file or directory
> > >
> > ># lxc move host1:test1 host2:test1
> > >error: Error transferring container data: restore failed:
> > >(00.008103) 1: Error (mount.c:2030): Can't mount at ./dev/.lxd-mounts:
> No such file or directory
> > >
> > >
> > >
> > >
> > >The backing store is the default (directory based). However, on host2
> the /var/lib/lxd/containers directory is a symlink to an ext3 mount. On
> host1 they’re on ext4, is that likely to cause any issues?
> > >
> > >The strange thing is, [randomly] the live move DOES succeed. I’ve
> definitely migrated a clean [running] container about 3 times from host2 to
> host1, but then when I try again with a new container it fails. This even
> worked before I updated the kernel. However, I can’t seem to find specific
> steps to replicate the successful move. I’ve never succeeded in migrating
> the same container back from host1 to host2 without stopping it. This is
> what is concerning me the most, I would expect either permanent failure or
> permanent success. I keep gaining false hope because the first time I
> migrated a container after updating the kernel it worked, so I thought,
> problem solved! But then I couldn’t migrate another :(
> > >
> > >-- Jamie
> > >
> > >
> > >
> > >05/11/2015, 16:58, "lxc-users on behalf of Tycho Andersen" <
> lxc-users-bounces at lists.linuxcontainers.org on behalf of
> tycho.andersen at canonical.com> wrote:
> > >
> > >>Hi Jamie,
> > >>
> > >>Thanks for trying it out.
> > >>
> > >>On Thu, Nov 05, 2015 at 11:39:43AM +0000, Jamie Brown wrote:
> > >>> Hello again,
> > >>>
> > >>> Oddly, I've now re-installed the old server and configured it
> identically to before (except now using RAID) and tried migrating a
> container back and I am getting a different failure;
> > >>>
> > >>> # lxc move host2:test host1:test
> > >>>
> > >>> error: Error transferring container data: restore failed:
> > >>> (00.007414)      1: Error (mount.c:2030): Can't mount at
> ./dev/.lxd-mounts: No such file or directory
> > >>> (00.026443) Error (cr-restore.c:1939): Restoring FAILED.
> > >>>
> > >>> The container appears in the remote container list whilst moving,
> but then after failure it is deleted and it is in the STOPPED state on the
> source host.
> > >>
> > >>Right, the restore failed, so the container had already been stopped
> > >>from the dump, so it was stopped on the target. What we should really
> > >>do is leave it in a frozen state after the dump, and once the restore
> > >>succeeds then we can kill it. Hopefully that's something I can
> > >>implement this cycle.
> > >>
> > >>As for the actual error, sounds like the target LXD didn't have
> > >>shmounts but the source one did. Are they using different backing
> > >>stores? What version of LXD are they?
> > >>
> > >>>
> > >>> Here's the output from the log, not sure how much is relevant to the
> migration attempt.
> > >>>
> > >>> # lxc info --show-log test
> > >>> ...
> > >>> lxc 1446723150.396 DEBUG    lxc_start - start.c:__lxc_start:1210 -
> unknown exit status for init: 9
> > >>>             lxc 1446723150.396 DEBUG    lxc_start -
> start.c:__lxc_start:1215 - Pushing physical nics back to host namespace
> > >>>             lxc 1446723150.396 DEBUG    lxc_start -
> start.c:__lxc_start:1218 - Tearing down virtual network devices used by
> container
> > >>>             lxc 1446723150.396 WARN     lxc_conf -
> conf.c:lxc_delete_network:2939 - failed to remove interface '(null)'
> > >>>             lxc 1446723150.396 INFO     lxc_error -
> error.c:lxc_error_set_and_log:55 - child <10499> ended on signal (9)
> > >>>             lxc 1446723150.396 WARN     lxc_conf -
> conf.c:lxc_delete_network:2939 - failed to remove interface '(null)'
> > >>>             lxc 1446723295.520 WARN     lxc_cgmanager -
> cgmanager.c:cgm_get:993 - do_cgm_get exited with error
> > >>>             lxc 1446723295.522 WARN     lxc_cgmanager -
> cgmanager.c:cgm_get:993 - do_cgm_get exited with error
> > >>>
> > >>>
> > >>> If I try to migrate a container in the reverse direction, I get a
> similar error;
> > >>>
> > >>> # lxc move host1:test1 host2:test1
> > >>> error: Error transferring container data: restore failed:
> > >>> (00.001093) Error (cgroup.c:1204): cg:    Can't mount controller dir
> .criu.cgyard.aOuQtF/net_cls: No such file or directory
> > >>
> > >>This is probably because the kernel on host1 is newer than the
> > >>kernel on host2 and has net_cls cgroup support where as host2's
> > >>doesn't.
> > >>
> > >>Tycho
> > >>
> > >>>
> > >>>
> > >>>
> > >>> Any ideas?
> > >>>
> > >>> -- Jamie
> > >>>
> > >>>
> > >>>
> > >>> On 05/11/2015, 08:05, "lxc-users on behalf of Jamie Brown" <
> lxc-users-bounces at lists.linuxcontainers.org on behalf of
> Jamie.Brown at mpec.co.uk> wrote:
> > >>>
> > >>> >Thanks Tycho, installing CRIU solved the problem;
> > >>> >
> > >>> ># apt-get install criu
> > >>> >
> > >>> >Should this package not be included as a dependency for LXD, or at
> least provide a meaningful warning if the package isn’t available? It seems
> odd to advertise out-the-box live migration in LXD, but then have to
> install another package to provide it.
> > >>> >
> > >>> >Is this in the documentation anywhere?
> > >>> >
> > >>> >Thanks again.
> > >>> >
> > >>> >-- Jamie
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> >On 04/11/2015, 16:47, "lxc-users on behalf of Tycho Andersen" <
> lxc-users-bounces at lists.linuxcontainers.org on behalf of
> tycho.andersen at canonical.com> wrote:
> > >>> >
> > >>> >>On Wed, Nov 04, 2015 at 01:48:44PM +0000, Jamie Brown wrote:
> > >>> >>> Greetings all.
> > >>> >>>
> > >>> >>> I’ve been using LXD in a development environment for a few weeks
> and so far very impressed,
> > >>> >>> I can see a really bright future for this technology!
> > >>> >>>
> > >>> >>> However, today I thought I’d try out the live migration, based
> on the following guide;
> > >>> >>> https://insights.ubuntu.com/2015/05/06/live-migration-in-lxd/
> > >>> >>>
> > >>> >>> I believe I have followed the steps correctly, however when I
> run the move command, I
> > >>> >>> receive the following output;
> > >>> >>>
> > >>> >>> # lxc move host1:test host2:test
> > >>> >>> error: Error transferring container data: checkpoint failed:
> > >>> >>> Problem accessing CRIU log: open
> /tmp/lxd_migration_899480871/dump.log: no such file or directory
> > >>> >>>
> > >>> >>> The file it is referring to above doesn't exist. However, there
> are other lxd_migration_*
> > >>> >>> directories with different numbers appended. Each time I attempt
> the migration a new directory
> > >>> >>> is created (e.g. lxd_migration_192965652), but there is no
> dump.log in there.
> > >>> >>>
> > >>> >>> The migration doesn't create a log file as per the guide above
> in;
> > >>> >>> /var/log/lxd/test/migration_{dump|restore}_.log
> > >>> >>>
> > >>> >>> Steps I've taken;
> > >>> >>>
> > >>> >>> - Copied all profiles from host1 to host2
> > >>> >>> - Added the migratable profile to the container
> > >>> >>> - Removed lxcfs package (on both hosts)
> > >>> >>> - Added the remote HTTPS hosts for both the local and remote
> hosts
> > >>> >>>
> > >>> >>> Both hosts are running Ubuntu 14.04.3 LTS (x64) with LXD version
> 0.21.
> > >>> >>>
> > >>> >>> The only difference I can tell between my hosts and the guide is
> that the 'migratable'
> > >>> >>> profile (which came out-the-box with my LXD installation)
> doesn't contain the autostart
> > >>> >>> entries as in the guide above;
> > >>> >>>
> > >>> >>> # lxc profile show migratable
> > >>> >>> name: migratable
> > >>> >>> config:
> > >>> >>>   raw.lxc: |-
> > >>> >>>     lxc.console = none
> > >>> >>>     lxc.cgroup.devices.deny = c 5:1 rwm
> > >>> >>>     lxc.seccomp =
> > >>> >>>   security.privileged: "true"
> > >>> >>> devices: {}
> > >>> >>>
> > >>> >>>
> > >>> >>> Any help would be much appreciated!
> > >>> >>
> > >>> >>Have you installed CRIU? lxc info --show-log test probably has more
> > >>> >>info about what failed, but my guess is that it can't find CRIU if
> you
> > >>> >>haven't installed it.
> > >>> >>
> > >>> >>Tycho
> > >>> >>
> > >>> >>> Thank you,
> > >>> >>>
> > >>> >>> Jamie
> > >>> >>>
> > >>> >>> _______________________________________________
> > >>> >>> lxc-users mailing list
> > >>> >>> lxc-users at lists.linuxcontainers.org
> > >>> >>> http://lists.linuxcontainers.org/listinfo/lxc-users
> > >>> >>_______________________________________________
> > >>> >>lxc-users mailing list
> > >>> >>lxc-users at lists.linuxcontainers.org
> > >>> >>http://lists.linuxcontainers.org/listinfo/lxc-users
> > >>> >_______________________________________________
> > >>> >lxc-users mailing list
> > >>> >lxc-users at lists.linuxcontainers.org
> > >>> >http://lists.linuxcontainers.org/listinfo/lxc-users
> > >>> _______________________________________________
> > >>> lxc-users mailing list
> > >>> lxc-users at lists.linuxcontainers.org
> > >>> http://lists.linuxcontainers.org/listinfo/lxc-users
> > >>_______________________________________________
> > >>lxc-users mailing list
> > >>lxc-users at lists.linuxcontainers.org
> > >>http://lists.linuxcontainers.org/listinfo/lxc-users
> > >_______________________________________________
> > >lxc-users mailing list
> > >lxc-users at lists.linuxcontainers.org
> > >http://lists.linuxcontainers.org/listinfo/lxc-users
> > _______________________________________________
> > lxc-users mailing list
> > lxc-users at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-users
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20151109/cb3b4967/attachment-0001.html>


More information about the lxc-users mailing list