[lxc-devel] [systemd-devel] Suspending access to opened/active /dev/nodes during application runtime

Fri Mar 7 20:51:04 UTC 2014

On 7 Mar 2014, at 20:24, Lennart Poettering <mzerqung at 0pointer.de> wrote:

> On Fri, 07.03.14 19:45, Lukasz Pawelczyk (havner at gmail.com) wrote:
> 
>> Problem:
>> Has anyone thought about a mechanism to limit/remove an access to a
>> device during an application runtime? Meaning we have an application
>> that has an open file descriptor to some /dev/node and depending on
>> *something* it gains or looses the access to it gracefully (with or
>> without a notification, but without any fatal consequences).
> 
> logind can mute input devices as sessions are switched, to enable
> unpriviliged X11 and wayland compositors.

Would you please elaborate on this? Where is this mechanism? How does it work without kernel space support? Is there some kernel space support I’m not aware of?

>> Example:
>> LXC. Imagine we have 2 separate containers. Both running full operating
>> systems. Specifically with 2 X servers. Both running concurrently of
> 
> Well, devices are not namespaced on Linux (with the single exception of
> network devices). An X server needs device access, hence this doesn't
> fly at all.
> 
> When you enumerate devices with libudev in a container they will never
> be marked as "initialized" and you do not get any udev hotplug events in
> containers, and you don#t have the host's udev db around, nor would it
> make any sense to you if you had. X11 and friends rely on udev
> however...
> 
> Before you think about doing something like this, you need to fix the
> kernel to provide namespaced devices (good luck!)

Precisly! That’s the generic idea. I’m not for implementing it though at this moment. I just wanted to know whether anybody actually though about it or maybe someone is interested in starting such a work, etc.

>> course. Both need the same input devices (e.g. we have just one mouse).
>> This creates a security problem when we want to have completely separate
>> environments. One container is active (being displayed on a monitor and
>> controlled with a mouse) while the other container runs evtest
>> /dev/input/something and grabs the secret password user typed in the
>> other.
> 
> logind can do this for you between sessions. But such a container setup
> will never work without proper device namespacing.

So how can it do it when there is no kernel support? You mean it could be doing this if the support were there?

>> Solutions:
>> The complete solution would comprise of 2 parts:
>> - a mechanism that would allow to temporally "hide" a device from an
>> open file descriptor.
>> - a mechanism for deciding whether application/process/namespace should
>> have an access to a specific device at a specific moment
> 
> Well, there's no point in inventing any "mechanisms" like this, as long
> as devices are not namespaced in the kernel, so that userspace in
> containers can enumerate/probe/identify/... things correctly…

True. My point is about kernel space implementation. Like I wrote. I haven’t seen anything like this in kernel source and I’m well away it should be done there.
I would just like to know if anybody is interested in this, if anybody started or would like to start such a thing.

I do understand that systemd/logind would only provide a mechanism for determining who should have an access and who shouldn’t (or to be more specific it would utilize some kernel space configuration like cgroups). But the work itself has to be done in kernel space.

-- 
Regards,
Havner