[lxc-devel] Working with glibc (PID/TID caches).

Mon Aug 4 15:47:59 UTC 2014

Quoting Eric W. Biederman (ebiederm at xmission.com):
> 
> Serge Hallyn <serge.hallyn at ubuntu.com> writes:
> > Quoting Carlos O'Donell (carlos at redhat.com):
> >> There was a complaint a while back from someone working
> >> on containers about glibc PID caching. I recently received
> >> another request to provide userspace with a way to reset
> >> any PID or TID caches to make clone-based sandboxing easier
> >> (CLONE_NEWPID).
> >> 
> >> How did lxc workaround the PID cache in glibc? What APIs
> >> could glibc provide to help the implementation of containers?
> 
> The primary work-around was that neither unshare(CLONE_NEWPID)
> nor clone(CLONE_NEWPID) changes the current processes pid,
> so in practice cached pids are not much of a problem.
> 
> That said clone(3) is a cumbersome API to use when you don't want to
> share the same address space (it sucks to have to allocate an extra
> stack just to call fork(2)), and that probably gets people resorting to
> calling syscall(SYS_clone,...) and then having pid problems.
> 
> > so this is (iiuc) only a problem with unshare(CLONE_NEWPID).  Two
> > programs pasted below.  The one using unshare will show the old
> > pre-unshare pid.  The one using clone correctly shows pid 1.  Now
> > it's possible that clone() wrapper is doing something already to
> > clean the pid, I didn't try syscall(__NR_CLONE) with CLONE_NEWPID
> > today.  But I've seen quite a bit of code using clone(CLONE_NEWPID),
> > and noone really uses the syscall directly any more.
> 
> Serge your unsharepid did not call fork(2) so there was no process placed
> into the pid namespace.

Yeah :)

> So there was not a pid 1 yet.  I admit the
> semantics of unshare(CLONE_NEWPID) are bizarre that way and seem to trip
> everyone up.

And I do remember when this was originally discussed.  Just being dumb.

> The goal when designing them was to have something that could be used in
> libpam during login so unshare could be called and the first process
> created after that would be in the new pid namespace.  But I don't think
> anyone has implemented that case yet. 

Hm, I remember CLONE_NEWNS through pam was the sole reason Janak implemented
unshare in the first place, but I can't see using CLONE_NEWPID being sane
in that case.  Maybe on a kiosk system, I guess.

> > AFAICT most ppl use clone, not unshare.  But so, perhaps glibc sholud
> > have a unshare() wrapper which clears the pid cache?  Or glibc could
> > export a function which callers would have to explicitly call.
> 
> Eric

So, the long and short of it is, we're all happy with what we've got?

Or did I as usual misread your main point?