[Lxc-users] Kernel 2.6.33-rc6, 3 bugs container specific.

Serge E. Hallyn serue at us.ibm.com
Wed Feb 3 15:03:50 UTC 2010


Quoting Daniel Lezcano (daniel.lezcano at free.fr):
> Serge E. Hallyn wrote:
> >Quoting Jean-Marc Pigeon (jmp at safe.ca):
> >>Hello,
> >>
> >>
> >>>I was wondering out loud about the best design to solve his problem.
> >>>
> >>>If we try to redirect kernel-generated messages to containers, we have
> >>>several problems, including whether we need to duplicate the messages
> >>>to the host container.  So in one sense it seems more flexible to
> >>>	1. send everything to host syslog
> >>		No, if we do that all CONTs message will reach
> >>		the same bucket and it will be difficult to sort
> >>		them out..
> >>		CONT sys_admin and HOST sys_admin could be different
> >>		"entity", so you debug CONT config and critical
> >>		needed information reach HOST (which you do not 		have access
> >>to).
> >
> >Yes, so a privileged task on HOST must pass that information back to
> >you on CONT.  That is not a valid complaint imo.  But how to sort the
> >msgs out is a valid question.
> >
> >We need some sort of identifier, unique system-wide, attached to.. something.
> >Is ifindex unique system-wide right now?  Oh, IIRC it is, but we wnat it to
> >be containerized, so that would be a bad choice :)
> >
> >>>	2. clamp down on syslog use by processes not in the init_user_ns
> >>		Could give me more detail??...
> >
> >Simplest choices would be to just refuse sys_syslog() and open(/proc/kmsg)
> >altogether from a container, or to only allow reading/writing messages
> >to own syslog.  (I had hoped to find time to try out the second option but
> >simply haven't had the time, and it doesn't look like I will very soon.
> >So if anyone else wants to, pls jump at it...)
> >
> >Then /proc/kmsg can provide what I described above through a FUSE file,
> >and if, as you mentioned, the container unmounts the FUSE fs and gets
> >to real procfs, they just get nothing.
> >
> >>>	3. let the userspace on the host copy messages into a socket or
> >>>	   file so child container can pretend it has real syslog.
> >>		So you trap printk message from CONT on the HOST and
> >>		redirect them on CONT but on a standard syslog channel.
> >>		Seem OK to me, as long /proc/kmsg is not existing
> >>		(/dev/null) in the CONT file tree.
> 
> 
> We have:
>        * Commands to sys_syslog:
>        *
>        *      0 -- Close the log.  Currently a NOP.
>        *      1 -- Open the log. Currently a NOP.
>        *      2 -- Read from the log.
>        *      3 -- Read all messages remaining in the ring buffer.
>        *      4 -- Read and clear all messages remaining in the ring buffer
>        *      5 -- Clear ring buffer.
>        *      6 -- Disable printk to console
>        *      7 -- Enable printk to console
>        *      8 -- Set level of messages printed to console
>        *      9 -- Return number of unread characters in the log buffer
>        *     10 -- Return size of the log buffer
> 
> And add:
>       *     11 -- create a new ring buffer for the current process
> and its childs
> 
> 
> We have, let's say a global ring buffer keep untouched, used by
> syslog(2) and printk. When we create a new ring buffer, we allocate
> it and assign to the nsproxy (global ring buffer is the default in
> the nsproxy).
> 
> The prink keeps writing in the global ring buffer and the syslog(2)
> writes to the "namespaced" ring buffer.
> 
> Does it makes sense ?

Yeah, it's a nice alternative.  Though (1) there is something to be said for
forcing a new ring buffer upon clone(CLONE_NEWUSER), and (2) assuming the
new ring buffer is pointed to from nsproxy, it might be frowned upon to do
an unshare/clone action in yet another way.

I still think our first concern should be safety, and that we should consider
just adding 'struct syslog_struct' to nsproxy, and making that NULL on a
clone(CLONE_NEWUSER).  any sys_syslog() or /proc/kmsg access returns -EINVAL
after that.  Then we can discuss whether and how to target printks to
namespaces, and whether duplicates should be sent to parent namespaces.

After we start getting flexible with syslog, the next request will be for
audit flexibility.  I don't even know how our netlink support suffices for
that right now.

(So, this all does turn into a big deal...)

-serge




More information about the lxc-users mailing list