[lxc-devel] [PATCH] Add support for checkpoint and restore via CRIU

Zmudzinski, Krystof C krystof.c.zmudzinski at intel.com
Thu Sep 18 16:01:05 UTC 2014


Let's start from the beginning.  I'm using LXC 1.1.0.alpha1 and criu 1.3.1.  I've modified lxc to dump the criu command line at the beginning of dump/restore logs.

I can lxc-start my container just fine (process tree attached)

First problem: I can't lxc-checkpoint my container without adding DECLARE_ARG("--evasive-devices"); to lxccontainer.c/exec_criu(...)

Without --evasive devices, I get these errors (full log attached) when executing lxc-checkpoint -n test-lxc -v -s -D /tmp/checkpoint

(00.063049) Error (files-reg.c:601): Unaccessible path opened 37:70490, need 2049:2244303
(00.063054) ----------------------------------------
(00.063056) Error (cr-dump.c:1603): Dump files (pid: 10706) failed with -1

(BTW, Unaccessible should probably be changed to Inaccessible.)

After adding --evasive-devices, lxc-checkpoint succeeds.  I do see several messages like this one (log attached):

(00.237676) Error (parasite-syscall.c:387): si_code=4 si_pid=11097 si_status=5

But at the end the log says (00.434907) Dumping finished successfully.

Then, I can restore the container as I explained in my previous post but the container can't be stopped, attached to, etc. as we have already discussed. 

Krystof


-----Original Message-----
From: lxc-devel [mailto:lxc-devel-bounces at lists.linuxcontainers.org] On Behalf Of Tycho Andersen
Sent: Wednesday, September 17, 2014 7:23 PM
To: LXC development mailing-list
Subject: Re: [lxc-devel] [PATCH] Add support for checkpoint and restore via CRIU

Hi Krystof,

On Wed, Sep 17, 2014 at 10:00:17PM +0000, Zmudzinski, Krystof C wrote:
> I call it centos because I got it from http://criu.org/LXC:
> 
> curl 
> http://download.openvz.org/template/precreated/centos-6-x86_64.tar.gz 
> | tar -xz -C test-lxc
> 
> but I renamed test-lxc to centos.
> 
> I'm attaching 4 restore logs with process trees for 4 different ways I 
> executed lxc-checkpoint -r

You've got a line like:

(00.085002)    365: Error (files-reg.c:820): File var/log/messages has bad size 24138 (expect 23943)

in your restore logs. This means that the restore image doesn't match the restore filesystem. (i.e., you probably didn't pass the -s option to lxc-checkpoint when you dumped the container; for now I think this option is required, since there is really no easy way to use lxc-checkpoint without it.)

There is another possibility, if you are trying to migrate containers across hosts. If not, though, passing -s to lxc-checkpoint when you dump the container will hopefully solve your problem.

Tycho

> Krystof
> 
> -----Original Message-----
> From: lxc-devel [mailto:lxc-devel-bounces at lists.linuxcontainers.org] 
> On Behalf Of Tycho Andersen
> Sent: Wednesday, September 17, 2014 1:19 PM
> To: LXC development mailing-list
> Subject: Re: [lxc-devel] [PATCH] Add support for checkpoint and 
> restore via CRIU
> 
> Hi Krystof,
> 
> On Wed, Sep 17, 2014 at 03:08:44PM +0000, Zmudzinski, Krystof C wrote:
> > test-lxc.conf:
> > 
> > lxc.utsname = test-lxc
> > lxc.mount = /root/centos/etc/fstab
> > lxc.rootfs = /root/centos/
> 
> Ah, I'm not sure if anyone has tried to dump a centos container yet :)
> 
> > lxc.console = none
> > lxc.tty = 0
> > lxc.network.type = veth
> > lxc.network.flags = up
> > lxc.network.link = lxcbr0
> > lxc.network.name = eth0
> > 
> > # hax for criu
> > lxc.console = none
> > lxc.tty = 0
> > lxc.cgroup.devices.deny = c 5:1 rwm
> > lxc.aa_profile = lxc-container-default-with-mounting
> > 
> > 
> > I meant -F not -V. Sorry.
> > 
> > ps axf:
> > 24602 pts/3    S      0:00              \_ ./lxc-checkpoint -n test-lxc -v -D /tmp/checkpoint -r -F -d
> > 24603 pts/3    S      0:00              |   \_ /usr/local/sbin/criu restore --tcp-established --evasive-devices --file-locks --link-remap --manage-cgroups 
> > 24605 ?        Ss     0:00              |       \_ /sbin/init
> > 24635 ?        Zs     0:00              |           \_ [criu] <defunct>
> > 24636 ?        Ss     0:00              |           \_ /usr/sbin/httpd
> > 24646 ?        S      0:00              |           |   \_ /usr/sbin/httpd
> > 24637 ?        Z      0:00              |           \_ [criu] <defunct>
> > 24638 ?        Zs     0:00              |           \_ [criu] <defunct>
> > 24639 ?        Zs     0:00              |           \_ [criu] <defunct>
> 
> Looks like criu didn't wait() on some of its processes, which means 
> CRIU probably hung, and lxc is waiting for the criu process to die, 
> which is why lxc-info and such don't respond. What does the 
> restore.log look like?
> 
> Tycho
> 
> > 24640 ?        Ss     0:00              |           \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam -n 2
> > 24645 ?        S      0:00              |           |   \_ /usr/sbin/saslauthd -m /var/run/saslauthd -a pam -n 2
> > 24641 ?        Ss     0:00              |           \_ xinetd -stayalive -pidfile /var/run/xinetd.pid
> > 24642 ?        Ss     0:00              |           \_ /usr/sbin/sshd
> > 24643 ?        S<s    0:00              |           \_ /sbin/udevd -d
> > 
> > The full criu command line:
> > /usr/local/sbin/criu restore --tcp-established --evasive-devices 
> > --file-locks --link-remap --manage-cgroups --action-script 
> > /usr/local/libexec/lxc/lxc-restore-net -D /tmp/checkpoint -o 
> > /tmp/checkpoint/restore.log -vvvvvv --root /usr/local/lib/lxc/rootfs 
> > --restore-detached --pidfile /tmp/fileGCxb5C --veth-pair eth0 
> > vethSID6CM
> > 
> > -----Original Message-----
> > From: lxc-devel [mailto:lxc-devel-bounces at lists.linuxcontainers.org] 
> > On Behalf Of Tycho Andersen
> > Sent: Tuesday, September 16, 2014 4:55 PM
> > To: LXC development mailing-list
> > Cc: criu at openvz.org
> > Subject: Re: [lxc-devel] [PATCH] Add support for checkpoint and 
> > restore via CRIU
> > 
> > On Tue, Sep 16, 2014 at 11:11:04PM +0000, Zmudzinski, Krystof C wrote:
> > > I've added DECLARE_ARG("--evasive-devices"); in lxccontainer.c/exec_criu and I was finally able to dump the container.
> > 
> > Ok, I've been trying to produce a situation where this is necessary but I couldn't. Can you paste your lxc configuration file?
> > 
> > > It also restored but only when both -V and -d were passed to lxc-checkpoint.
> > 
> > -V isn't an argument to lxc-checkpoint. (Perhaps you mean -v? That /shouldn't/ affect things, it is just logging.) What happens when you don't pass -d?
> > 
> > > But lxc-stop, lxc-attach, etc. hang  after the container is restored.  But that is expected at this point, isn't it?
> > 
> > No, those should work. Can you show the output of ps auxf?
> > 
> > > The interesting part is that something like this is not needed but 
> > > it is used in run.sh
> > > 
> > > DECLARE_ARG("-n net -n mnt -n ipc -n pid");
> > 
> > That's not needed, it is an old criu option (CRIU's wiki is outdated).
> > 
> > > Lastly, could criu dump the entire command line to the logs when it is executed?  So the beginning of the log starts with something like:
> > > 
> > > (00.000047) ========================================
> > > (00.000057) /usr/local/sbin/criu dump --tcp-established --evasive-devices --file-locks --link-remap --manage-cgroups.......
> > > (00.000087) Dumping processes (pid: 22614)
> > > (00.000093) ========================================
> > 
> > The log itself is generated by criu, so that is probably a question 
> > for the criu list, not the lxc list :)
> > 
> > Tycho
> > 
> > > Krystof
> > > 
> > 
> > > _______________________________________________
> > > lxc-devel mailing list
> > > lxc-devel at lists.linuxcontainers.org
> > > http://lists.linuxcontainers.org/listinfo/lxc-devel
> > 
> > _______________________________________________
> > lxc-devel mailing list
> > lxc-devel at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-devel
> > _______________________________________________
> > lxc-devel mailing list
> > lxc-devel at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-devel
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel









> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel

_______________________________________________
lxc-devel mailing list
lxc-devel at lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ptree_after_lxc-start.txt
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20140918/11a5894e/attachment-0003.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lxc-checkpoint_fails.txt
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20140918/11a5894e/attachment-0004.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lxc-checkpoint_w_evasive-devices.txt
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20140918/11a5894e/attachment-0005.txt>


More information about the lxc-devel mailing list