[lxc-users] LXD Live Migration

Bostjan Skufca bostjan at a2o.si
Mon Nov 9 15:55:03 UTC 2015


Depends on what your requirements for "production" are. Live migration? I
guess not. Environment isolation for more-or-less trusted containers? Yes,
using it here for quite a while (since 1.0.6 - lxc, not lxd), if possible
as unprivileged containers, as it removes A LOT of attack surface for host.


b.


On 9 November 2015 at 16:37, Saint Michael <venefax at gmail.com> wrote:

> I must assume that LXC is not ready for production yet. Am I wrong?
>
> On Mon, Nov 9, 2015 at 9:54 AM, Tycho Andersen <
> tycho.andersen at canonical.com> wrote:
>
>> On Fri, Nov 06, 2015 at 08:43:33AM +0000, Jamie Brown wrote:
>> > I’ve just discovered a new failure on a different container too;
>> >
>> > # lxc move host2:nexus host1:nexus
>> > error: Error transferring container data: checkpoint failed:
>> > (00.355457) Error (files-reg.c:422): Can't dump ghost file
>> /usr/local/sonatype-work/nexus/tmp/jar_cache5838699621686145685.tmp of
>> 1177738 size, increase limit
>> > (00.355477) Error (cr-dump.c:1255): Dump files (pid: 22072) failed with
>> -1
>> > (00.357100) Error (cr-dump.c:1617): Dumping FAILED.
>>
>> So this is actually an error because a default limit in criu is not
>> high enough. You can set this via the --ghost-limit in criu, but LXC
>> currently exposes no way to set this, although I'm hoping to add a new
>> API call to allow people to set stuff like this in the near future.
>>
>> Thanks,
>>
>> Tycho
>>
>> >
>> >
>> >
>> > On 06/11/2015, 08:40, "lxc-users on behalf of Jamie Brown" <
>> lxc-users-bounces at lists.linuxcontainers.org on behalf of
>> Jamie.Brown at mpec.co.uk> wrote:
>> >
>> > >Tycho,
>> > >
>> > >Thanks for your help.
>> > >
>> > >The kernels were in fact different versions, though I’m not sure how I
>> got into that state! So they’re now both running 3.19.0.
>> > >
>> > >Now, I at least receive the same error when migrating in both
>> directions;
>> > ># lxc move host2:test host1:test2
>> > >error: Error transferring container data: restore failed:
>> > >(00.008103)      1: Error (mount.c:2030): Can't mount at
>> ./dev/.lxd-mounts: No such file or directory
>> > >
>> > ># lxc move host1:test1 host2:test1
>> > >error: Error transferring container data: restore failed:
>> > >(00.008103) 1: Error (mount.c:2030): Can't mount at ./dev/.lxd-mounts:
>> No such file or directory
>> > >
>> > >
>> > >
>> > >
>> > >The backing store is the default (directory based). However, on host2
>> the /var/lib/lxd/containers directory is a symlink to an ext3 mount. On
>> host1 they’re on ext4, is that likely to cause any issues?
>> > >
>> > >The strange thing is, [randomly] the live move DOES succeed. I’ve
>> definitely migrated a clean [running] container about 3 times from host2 to
>> host1, but then when I try again with a new container it fails. This even
>> worked before I updated the kernel. However, I can’t seem to find specific
>> steps to replicate the successful move. I’ve never succeeded in migrating
>> the same container back from host1 to host2 without stopping it. This is
>> what is concerning me the most, I would expect either permanent failure or
>> permanent success. I keep gaining false hope because the first time I
>> migrated a container after updating the kernel it worked, so I thought,
>> problem solved! But then I couldn’t migrate another :(
>> > >
>> > >-- Jamie
>> > >
>> > >
>> > >
>> > >05/11/2015, 16:58, "lxc-users on behalf of Tycho Andersen" <
>> lxc-users-bounces at lists.linuxcontainers.org on behalf of
>> tycho.andersen at canonical.com> wrote:
>> > >
>> > >>Hi Jamie,
>> > >>
>> > >>Thanks for trying it out.
>> > >>
>> > >>On Thu, Nov 05, 2015 at 11:39:43AM +0000, Jamie Brown wrote:
>> > >>> Hello again,
>> > >>>
>> > >>> Oddly, I've now re-installed the old server and configured it
>> identically to before (except now using RAID) and tried migrating a
>> container back and I am getting a different failure;
>> > >>>
>> > >>> # lxc move host2:test host1:test
>> > >>>
>> > >>> error: Error transferring container data: restore failed:
>> > >>> (00.007414)      1: Error (mount.c:2030): Can't mount at
>> ./dev/.lxd-mounts: No such file or directory
>> > >>> (00.026443) Error (cr-restore.c:1939): Restoring FAILED.
>> > >>>
>> > >>> The container appears in the remote container list whilst moving,
>> but then after failure it is deleted and it is in the STOPPED state on the
>> source host.
>> > >>
>> > >>Right, the restore failed, so the container had already been stopped
>> > >>from the dump, so it was stopped on the target. What we should really
>> > >>do is leave it in a frozen state after the dump, and once the restore
>> > >>succeeds then we can kill it. Hopefully that's something I can
>> > >>implement this cycle.
>> > >>
>> > >>As for the actual error, sounds like the target LXD didn't have
>> > >>shmounts but the source one did. Are they using different backing
>> > >>stores? What version of LXD are they?
>> > >>
>> > >>>
>> > >>> Here's the output from the log, not sure how much is relevant to
>> the migration attempt.
>> > >>>
>> > >>> # lxc info --show-log test
>> > >>> ...
>> > >>> lxc 1446723150.396 DEBUG    lxc_start - start.c:__lxc_start:1210 -
>> unknown exit status for init: 9
>> > >>>             lxc 1446723150.396 DEBUG    lxc_start -
>> start.c:__lxc_start:1215 - Pushing physical nics back to host namespace
>> > >>>             lxc 1446723150.396 DEBUG    lxc_start -
>> start.c:__lxc_start:1218 - Tearing down virtual network devices used by
>> container
>> > >>>             lxc 1446723150.396 WARN     lxc_conf -
>> conf.c:lxc_delete_network:2939 - failed to remove interface '(null)'
>> > >>>             lxc 1446723150.396 INFO     lxc_error -
>> error.c:lxc_error_set_and_log:55 - child <10499> ended on signal (9)
>> > >>>             lxc 1446723150.396 WARN     lxc_conf -
>> conf.c:lxc_delete_network:2939 - failed to remove interface '(null)'
>> > >>>             lxc 1446723295.520 WARN     lxc_cgmanager -
>> cgmanager.c:cgm_get:993 - do_cgm_get exited with error
>> > >>>             lxc 1446723295.522 WARN     lxc_cgmanager -
>> cgmanager.c:cgm_get:993 - do_cgm_get exited with error
>> > >>>
>> > >>>
>> > >>> If I try to migrate a container in the reverse direction, I get a
>> similar error;
>> > >>>
>> > >>> # lxc move host1:test1 host2:test1
>> > >>> error: Error transferring container data: restore failed:
>> > >>> (00.001093) Error (cgroup.c:1204): cg:    Can't mount controller
>> dir .criu.cgyard.aOuQtF/net_cls: No such file or directory
>> > >>
>> > >>This is probably because the kernel on host1 is newer than the
>> > >>kernel on host2 and has net_cls cgroup support where as host2's
>> > >>doesn't.
>> > >>
>> > >>Tycho
>> > >>
>> > >>>
>> > >>>
>> > >>>
>> > >>> Any ideas?
>> > >>>
>> > >>> -- Jamie
>> > >>>
>> > >>>
>> > >>>
>> > >>> On 05/11/2015, 08:05, "lxc-users on behalf of Jamie Brown" <
>> lxc-users-bounces at lists.linuxcontainers.org on behalf of
>> Jamie.Brown at mpec.co.uk> wrote:
>> > >>>
>> > >>> >Thanks Tycho, installing CRIU solved the problem;
>> > >>> >
>> > >>> ># apt-get install criu
>> > >>> >
>> > >>> >Should this package not be included as a dependency for LXD, or at
>> least provide a meaningful warning if the package isn’t available? It seems
>> odd to advertise out-the-box live migration in LXD, but then have to
>> install another package to provide it.
>> > >>> >
>> > >>> >Is this in the documentation anywhere?
>> > >>> >
>> > >>> >Thanks again.
>> > >>> >
>> > >>> >-- Jamie
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> >On 04/11/2015, 16:47, "lxc-users on behalf of Tycho Andersen" <
>> lxc-users-bounces at lists.linuxcontainers.org on behalf of
>> tycho.andersen at canonical.com> wrote:
>> > >>> >
>> > >>> >>On Wed, Nov 04, 2015 at 01:48:44PM +0000, Jamie Brown wrote:
>> > >>> >>> Greetings all.
>> > >>> >>>
>> > >>> >>> I’ve been using LXD in a development environment for a few
>> weeks and so far very impressed,
>> > >>> >>> I can see a really bright future for this technology!
>> > >>> >>>
>> > >>> >>> However, today I thought I’d try out the live migration, based
>> on the following guide;
>> > >>> >>> https://insights.ubuntu.com/2015/05/06/live-migration-in-lxd/
>> > >>> >>>
>> > >>> >>> I believe I have followed the steps correctly, however when I
>> run the move command, I
>> > >>> >>> receive the following output;
>> > >>> >>>
>> > >>> >>> # lxc move host1:test host2:test
>> > >>> >>> error: Error transferring container data: checkpoint failed:
>> > >>> >>> Problem accessing CRIU log: open
>> /tmp/lxd_migration_899480871/dump.log: no such file or directory
>> > >>> >>>
>> > >>> >>> The file it is referring to above doesn't exist. However, there
>> are other lxd_migration_*
>> > >>> >>> directories with different numbers appended. Each time I
>> attempt the migration a new directory
>> > >>> >>> is created (e.g. lxd_migration_192965652), but there is no
>> dump.log in there.
>> > >>> >>>
>> > >>> >>> The migration doesn't create a log file as per the guide above
>> in;
>> > >>> >>> /var/log/lxd/test/migration_{dump|restore}_.log
>> > >>> >>>
>> > >>> >>> Steps I've taken;
>> > >>> >>>
>> > >>> >>> - Copied all profiles from host1 to host2
>> > >>> >>> - Added the migratable profile to the container
>> > >>> >>> - Removed lxcfs package (on both hosts)
>> > >>> >>> - Added the remote HTTPS hosts for both the local and remote
>> hosts
>> > >>> >>>
>> > >>> >>> Both hosts are running Ubuntu 14.04.3 LTS (x64) with LXD
>> version 0.21.
>> > >>> >>>
>> > >>> >>> The only difference I can tell between my hosts and the guide
>> is that the 'migratable'
>> > >>> >>> profile (which came out-the-box with my LXD installation)
>> doesn't contain the autostart
>> > >>> >>> entries as in the guide above;
>> > >>> >>>
>> > >>> >>> # lxc profile show migratable
>> > >>> >>> name: migratable
>> > >>> >>> config:
>> > >>> >>>   raw.lxc: |-
>> > >>> >>>     lxc.console = none
>> > >>> >>>     lxc.cgroup.devices.deny = c 5:1 rwm
>> > >>> >>>     lxc.seccomp =
>> > >>> >>>   security.privileged: "true"
>> > >>> >>> devices: {}
>> > >>> >>>
>> > >>> >>>
>> > >>> >>> Any help would be much appreciated!
>> > >>> >>
>> > >>> >>Have you installed CRIU? lxc info --show-log test probably has
>> more
>> > >>> >>info about what failed, but my guess is that it can't find CRIU
>> if you
>> > >>> >>haven't installed it.
>> > >>> >>
>> > >>> >>Tycho
>> > >>> >>
>> > >>> >>> Thank you,
>> > >>> >>>
>> > >>> >>> Jamie
>> > >>> >>>
>> > >>> >>> _______________________________________________
>> > >>> >>> lxc-users mailing list
>> > >>> >>> lxc-users at lists.linuxcontainers.org
>> > >>> >>> http://lists.linuxcontainers.org/listinfo/lxc-users
>> > >>> >>_______________________________________________
>> > >>> >>lxc-users mailing list
>> > >>> >>lxc-users at lists.linuxcontainers.org
>> > >>> >>http://lists.linuxcontainers.org/listinfo/lxc-users
>> > >>> >_______________________________________________
>> > >>> >lxc-users mailing list
>> > >>> >lxc-users at lists.linuxcontainers.org
>> > >>> >http://lists.linuxcontainers.org/listinfo/lxc-users
>> > >>> _______________________________________________
>> > >>> lxc-users mailing list
>> > >>> lxc-users at lists.linuxcontainers.org
>> > >>> http://lists.linuxcontainers.org/listinfo/lxc-users
>> > >>_______________________________________________
>> > >>lxc-users mailing list
>> > >>lxc-users at lists.linuxcontainers.org
>> > >>http://lists.linuxcontainers.org/listinfo/lxc-users
>> > >_______________________________________________
>> > >lxc-users mailing list
>> > >lxc-users at lists.linuxcontainers.org
>> > >http://lists.linuxcontainers.org/listinfo/lxc-users
>> > _______________________________________________
>> > lxc-users mailing list
>> > lxc-users at lists.linuxcontainers.org
>> > http://lists.linuxcontainers.org/listinfo/lxc-users
>> _______________________________________________
>> lxc-users mailing list
>> lxc-users at lists.linuxcontainers.org
>> http://lists.linuxcontainers.org/listinfo/lxc-users
>>
>
>
> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20151109/e7774619/attachment.html>


More information about the lxc-users mailing list