[lxc-users] Possible race condition in kernel, capset() fails randomly

Serge Hallyn serge.hallyn at ubuntu.com
Tue Jun 3 23:46:29 UTC 2014


Sorry, I meant wrapping .dockerinit itself.  But perhaps the best place
to start is to just create and run a regular container, to make sure
you're not having some arm/kernel/other bug:

	sudo lxc-create -t download -n u1 -- -d ubuntu -r trusty -a amd64
	sudo lxc-start -n u1

-serge

Quoting Vladimir Pouzanov (farcaller at gmail.com):
> Docker starts lxc in the following way:
> 
> lxc-start -n
> 97a0813ce28954250aaa807567c9053e3e443a8651791e9c591572b0850095af
> /.dockerinit -driver lxc -g 172.17.42.1 -i 172.17.0.2/16 -mtu 1500 --
> /bin/true
> 
> strace of lxc-start: https://gist.github.com/farcaller/6fd5b23952675aed894d
> 
> it doesn't seem to run ./dockerinit in case of failure.
> 
> 
> On Tue, Jun 3, 2014 at 8:14 PM, Serge Hallyn <serge.hallyn at ubuntu.com>
> wrote:
> 
> > Quoting Vladimir Pouzanov (farcaller at gmail.com):
> > > This bug happens with docker, but I don't see any traction on my issue
> > over
> > > there so trying to escalate further. The original bug report is here:
> > > https://github.com/dotcloud/docker/issues/4556, here are all the
> > > interesting details.
> > >
> > > I'm running an armv7 box (wandboard) with 3.14.4-1-ARCH kernel. I cannot
> > > reliably use docker (with lxc driver, or with native driver) as it
> > crashes
> > > often (on the last docker/lxc/kernel combo I get 41 out of 100 failures
> > > with native docker and 23 out of 100 with lxc).
> > >
> > > The lxc version is 1.0.3, docker is 0.11.1.
> > >
> > > From docker side the error looks like:
> > > finalize namespace drop capabilities operation not permitted
> > >
> > > (generated by docker capabilities module,
> > >
> > https://github.com/dotcloud/docker/blob/master/pkg/libcontainer/security/capabilities/capabilities.go#L32
> > > )
> > >
> > > lxc-start just silently returns 1 and I didn't manage to get any
> > reasonable
> > > log output from it.
> >
> > How did you use lxc-start exactly?
> >
> > > I managed to look a bit deeper into kernel side of things on what is
> > > failing exactly, and the offending syscall seems to be:
> > >
> > > https://github.com/torvalds/linux/blob/master/kernel/capability.c#L240
> > >
> > > where pid is always 1 and task_pid_vnr(current) is 7, sometimes 6,
> > rarely 1
> > > (the good case).
> >
> > You'll probably want to get init to run under strace so you can figure out
> > why current is pid 7 instead of 1.  What binary is it actually that's doing
> > the capset?
> >
> > > Any ideas on what could be going wrong? What other info can I provide to
> > > track this bug down?
> > >
> > > --
> > > Sincerely,
> > > Vladimir "Farcaller" Pouzanov
> > > http://farcaller.net/
> >
> > > _______________________________________________
> > > lxc-users mailing list
> > > lxc-users at lists.linuxcontainers.org
> > > http://lists.linuxcontainers.org/listinfo/lxc-users
> >
> > _______________________________________________
> > lxc-users mailing list
> > lxc-users at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-users
> 
> 
> 
> 
> -- 
> Sincerely,
> Vladimir "Farcaller" Pouzanov
> http://farcaller.net/

> _______________________________________________
> lxc-users mailing list
> lxc-users at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-users



More information about the lxc-users mailing list