[lxc-users] debugging a failing clone() call

Christian Brauner christian.brauner at getenv.org
Mon Apr 9 08:28:27 UTC 2018


On Fri, Mar 23, 2018 at 06:13:15AM -0400, Andrew Cann wrote:
> Hello,
> 
> The folowing sycall is failing when called on a Travis-CI build machine.
> 
>     clone(..,
>         CLONE_FILES |
>         CLONE_IO |
>         CLONE_SIGHAND |
>         CLONE_VM |
>         CLONE_SYSVSEM |
>         CLONE_NEWNET |
>         CLONE_NEWUTS |
>         CLONE_NEWUSER,
>         ..
>     );
> 
> This works when I run it on my machine, but inside the Docker container that
> Travis creates it fails with EPERM. Can anyone suggest why this might be
> happening? The clone(2) manpage lists possible reasons:
> 
> 
> EPERM   CLONE_NEWCGROUP, CLONE_NEWIPC, CLONE_NEWNET, CLONE_NEWNS, CLONE_NEWPID,
>         or CLONE_NEWUTS was specified by an unprivileged process (process
>         without CAP_SYS_ADMIN).
> 
> This shouldn't apply since I'm using CLONE_NEWUSER
> 
> 
> EPERM   CLONE_PID was specified by a process other than process 0. (This error
>         occurs only on Linux 2.5.15 and earlier.)
> 
> Doesn't apply.
> 
> 
> EPERM   CLONE_NEWUSER was specified in flags, but either the effective user ID
>         or the effective group ID of the caller does not have a mapping in the
>         parent namespace (see user_namespaces(7)).
> 
> Again, this shouldn't apply. The process creating the namespace has a valid
> (not-nobody) uid and gid.
> 
> 
> EPERM   (since Linux 3.9) CLONE_NEWUSER was specified in flags and the caller
>         is in a chroot environment (i.e., the caller's root directory does not
>         match the root directory of the mount namespace in which it resides).
> 
> Possibly this one? The docker container shouldn't be aware that it's running in
> a chroot though. Calling mount inside the container lists:
> 
>     overlay on / type overlay (rw,relatime,...)
> 
> Which indicates that it's living inside its own mount namespace with its own
> root directory.
> 
> So I'm confused. Does anyone have any suggestions for why else this might be
> failing or thngs I could try to debug it? Is there a way to get more than just
> an eror code out of linux? Are there reasons for giving that error code that
> aren't listed in the man page?

Well, the Docker container might have dropped CAP_SYS_ADMIN at which
point you're not allowed to use any of the CLONE_* flags. Docker does
this as a security measure since they still default to running
privileged containers which are inherently unsafe.

Another possibility is that you're running a distro as host that does
not enable CLONE_NEWUSER by default. This can e.g. be the case with
CentOS based distros.

Christian


More information about the lxc-users mailing list