[lxc-users] LXD Live Migration

Jamie Brown Jamie.Brown at mpec.co.uk
Fri Nov 6 08:43:33 UTC 2015


I’ve just discovered a new failure on a different container too;

# lxc move host2:nexus host1:nexus
error: Error transferring container data: checkpoint failed:
(00.355457) Error (files-reg.c:422): Can't dump ghost file /usr/local/sonatype-work/nexus/tmp/jar_cache5838699621686145685.tmp of 1177738 size, increase limit
(00.355477) Error (cr-dump.c:1255): Dump files (pid: 22072) failed with -1
(00.357100) Error (cr-dump.c:1617): Dumping FAILED.




On 06/11/2015, 08:40, "lxc-users on behalf of Jamie Brown" <lxc-users-bounces at lists.linuxcontainers.org on behalf of Jamie.Brown at mpec.co.uk> wrote:

>Tycho,
>
>Thanks for your help.
>
>The kernels were in fact different versions, though I’m not sure how I got into that state! So they’re now both running 3.19.0.
>
>Now, I at least receive the same error when migrating in both directions;
># lxc move host2:test host1:test2
>error: Error transferring container data: restore failed:
>(00.008103)      1: Error (mount.c:2030): Can't mount at ./dev/.lxd-mounts: No such file or directory
>
># lxc move host1:test1 host2:test1
>error: Error transferring container data: restore failed:
>(00.008103) 1: Error (mount.c:2030): Can't mount at ./dev/.lxd-mounts: No such file or directory
>
>
>
>
>The backing store is the default (directory based). However, on host2 the /var/lib/lxd/containers directory is a symlink to an ext3 mount. On host1 they’re on ext4, is that likely to cause any issues?
>
>The strange thing is, [randomly] the live move DOES succeed. I’ve definitely migrated a clean [running] container about 3 times from host2 to host1, but then when I try again with a new container it fails. This even worked before I updated the kernel. However, I can’t seem to find specific steps to replicate the successful move. I’ve never succeeded in migrating the same container back from host1 to host2 without stopping it. This is what is concerning me the most, I would expect either permanent failure or permanent success. I keep gaining false hope because the first time I migrated a container after updating the kernel it worked, so I thought, problem solved! But then I couldn’t migrate another :(
>
>-- Jamie
>
>
>
>05/11/2015, 16:58, "lxc-users on behalf of Tycho Andersen" <lxc-users-bounces at lists.linuxcontainers.org on behalf of tycho.andersen at canonical.com> wrote:
>
>>Hi Jamie,
>>
>>Thanks for trying it out.
>>
>>On Thu, Nov 05, 2015 at 11:39:43AM +0000, Jamie Brown wrote:
>>> Hello again,
>>> 
>>> Oddly, I've now re-installed the old server and configured it identically to before (except now using RAID) and tried migrating a container back and I am getting a different failure;
>>> 
>>> # lxc move host2:test host1:test
>>> 
>>> error: Error transferring container data: restore failed:
>>> (00.007414)      1: Error (mount.c:2030): Can't mount at ./dev/.lxd-mounts: No such file or directory
>>> (00.026443) Error (cr-restore.c:1939): Restoring FAILED.
>>> 
>>> The container appears in the remote container list whilst moving, but then after failure it is deleted and it is in the STOPPED state on the source host.
>>
>>Right, the restore failed, so the container had already been stopped
>>from the dump, so it was stopped on the target. What we should really
>>do is leave it in a frozen state after the dump, and once the restore
>>succeeds then we can kill it. Hopefully that's something I can
>>implement this cycle.
>>
>>As for the actual error, sounds like the target LXD didn't have
>>shmounts but the source one did. Are they using different backing
>>stores? What version of LXD are they?
>>
>>> 
>>> Here's the output from the log, not sure how much is relevant to the migration attempt.
>>> 
>>> # lxc info --show-log test
>>> ...
>>> lxc 1446723150.396 DEBUG    lxc_start - start.c:__lxc_start:1210 - unknown exit status for init: 9
>>>             lxc 1446723150.396 DEBUG    lxc_start - start.c:__lxc_start:1215 - Pushing physical nics back to host namespace
>>>             lxc 1446723150.396 DEBUG    lxc_start - start.c:__lxc_start:1218 - Tearing down virtual network devices used by container
>>>             lxc 1446723150.396 WARN     lxc_conf - conf.c:lxc_delete_network:2939 - failed to remove interface '(null)'
>>>             lxc 1446723150.396 INFO     lxc_error - error.c:lxc_error_set_and_log:55 - child <10499> ended on signal (9)
>>>             lxc 1446723150.396 WARN     lxc_conf - conf.c:lxc_delete_network:2939 - failed to remove interface '(null)'
>>>             lxc 1446723295.520 WARN     lxc_cgmanager - cgmanager.c:cgm_get:993 - do_cgm_get exited with error
>>>             lxc 1446723295.522 WARN     lxc_cgmanager - cgmanager.c:cgm_get:993 - do_cgm_get exited with error
>>> 
>>> 
>>> If I try to migrate a container in the reverse direction, I get a similar error;
>>> 
>>> # lxc move host1:test1 host2:test1
>>> error: Error transferring container data: restore failed:
>>> (00.001093) Error (cgroup.c:1204): cg: 	Can't mount controller dir .criu.cgyard.aOuQtF/net_cls: No such file or directory
>>
>>This is probably because the kernel on host1 is newer than the
>>kernel on host2 and has net_cls cgroup support where as host2's
>>doesn't.
>>
>>Tycho
>>
>>> 
>>> 
>>> 
>>> Any ideas?
>>> 
>>> -- Jamie
>>> 
>>> 
>>> 
>>> On 05/11/2015, 08:05, "lxc-users on behalf of Jamie Brown" <lxc-users-bounces at lists.linuxcontainers.org on behalf of Jamie.Brown at mpec.co.uk> wrote:
>>> 
>>> >Thanks Tycho, installing CRIU solved the problem;
>>> >
>>> ># apt-get install criu
>>> >
>>> >Should this package not be included as a dependency for LXD, or at least provide a meaningful warning if the package isn’t available? It seems odd to advertise out-the-box live migration in LXD, but then have to install another package to provide it.
>>> >
>>> >Is this in the documentation anywhere?
>>> >
>>> >Thanks again.
>>> >
>>> >-- Jamie
>>> >
>>> >
>>> >
>>> >
>>> >On 04/11/2015, 16:47, "lxc-users on behalf of Tycho Andersen" <lxc-users-bounces at lists.linuxcontainers.org on behalf of tycho.andersen at canonical.com> wrote:
>>> >
>>> >>On Wed, Nov 04, 2015 at 01:48:44PM +0000, Jamie Brown wrote:
>>> >>> Greetings all.
>>> >>> 
>>> >>> I’ve been using LXD in a development environment for a few weeks and so far very impressed, 
>>> >>> I can see a really bright future for this technology!
>>> >>> 
>>> >>> However, today I thought I’d try out the live migration, based on the following guide;
>>> >>> https://insights.ubuntu.com/2015/05/06/live-migration-in-lxd/
>>> >>> 
>>> >>> I believe I have followed the steps correctly, however when I run the move command, I 
>>> >>> receive the following output;
>>> >>> 
>>> >>> # lxc move host1:test host2:test
>>> >>> error: Error transferring container data: checkpoint failed:
>>> >>> Problem accessing CRIU log: open /tmp/lxd_migration_899480871/dump.log: no such file or directory
>>> >>> 
>>> >>> The file it is referring to above doesn't exist. However, there are other lxd_migration_* 
>>> >>> directories with different numbers appended. Each time I attempt the migration a new directory 
>>> >>> is created (e.g. lxd_migration_192965652), but there is no dump.log in there.
>>> >>> 
>>> >>> The migration doesn't create a log file as per the guide above in;
>>> >>> /var/log/lxd/test/migration_{dump|restore}_.log
>>> >>> 
>>> >>> Steps I've taken;
>>> >>> 
>>> >>> - Copied all profiles from host1 to host2
>>> >>> - Added the migratable profile to the container
>>> >>> - Removed lxcfs package (on both hosts)
>>> >>> - Added the remote HTTPS hosts for both the local and remote hosts
>>> >>> 
>>> >>> Both hosts are running Ubuntu 14.04.3 LTS (x64) with LXD version 0.21.
>>> >>> 
>>> >>> The only difference I can tell between my hosts and the guide is that the 'migratable'
>>> >>> profile (which came out-the-box with my LXD installation) doesn't contain the autostart
>>> >>> entries as in the guide above;
>>> >>> 
>>> >>> # lxc profile show migratable
>>> >>> name: migratable
>>> >>> config:
>>> >>>   raw.lxc: |-
>>> >>>     lxc.console = none
>>> >>>     lxc.cgroup.devices.deny = c 5:1 rwm
>>> >>>     lxc.seccomp =
>>> >>>   security.privileged: "true"
>>> >>> devices: {}
>>> >>> 
>>> >>> 
>>> >>> Any help would be much appreciated!
>>> >>
>>> >>Have you installed CRIU? lxc info --show-log test probably has more
>>> >>info about what failed, but my guess is that it can't find CRIU if you
>>> >>haven't installed it.
>>> >>
>>> >>Tycho
>>> >>
>>> >>> Thank you,
>>> >>> 
>>> >>> Jamie
>>> >>> 
>>> >>> _______________________________________________
>>> >>> lxc-users mailing list
>>> >>> lxc-users at lists.linuxcontainers.org
>>> >>> http://lists.linuxcontainers.org/listinfo/lxc-users
>>> >>_______________________________________________
>>> >>lxc-users mailing list
>>> >>lxc-users at lists.linuxcontainers.org
>>> >>http://lists.linuxcontainers.org/listinfo/lxc-users
>>> >_______________________________________________
>>> >lxc-users mailing list
>>> >lxc-users at lists.linuxcontainers.org
>>> >http://lists.linuxcontainers.org/listinfo/lxc-users
>>> _______________________________________________
>>> lxc-users mailing list
>>> lxc-users at lists.linuxcontainers.org
>>> http://lists.linuxcontainers.org/listinfo/lxc-users
>>_______________________________________________
>>lxc-users mailing list
>>lxc-users at lists.linuxcontainers.org
>>http://lists.linuxcontainers.org/listinfo/lxc-users
>_______________________________________________
>lxc-users mailing list
>lxc-users at lists.linuxcontainers.org
>http://lists.linuxcontainers.org/listinfo/lxc-users


More information about the lxc-users mailing list