[lxc-devel] seccomp maxnr option?

Serge Hallyn serge.hallyn at ubuntu.com
Tue Jun 24 17:00:30 UTC 2014


Quoting Stéphane Graber (stgraber at ubuntu.com):
> On Tue, Jun 24, 2014 at 03:16:37PM +0000, Serge Hallyn wrote:
> > Quoting Stéphane Graber (stgraber at ubuntu.com):
> > > On Tue, Jun 24, 2014 at 03:03:18PM +0000, Serge Hallyn wrote:
> > > > Quoting Stéphane Graber (stgraber at ubuntu.com):
> > > > > On Tue, Jun 24, 2014 at 02:23:33PM +0000, Serge Hallyn wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > Not too long ago we introduced the v2 seccomp policy format, which allows
> > > > > > for blacklists.  One problem with blacklists is that on a newer kernel there
> > > > > > may be new syscalls which shouldn't be trusted.
> > > > > > 
> > > > > > So I'd like to introduce a max-syscall-number option, so that any higher
> > > > > > syscall number will be also blacklisted.  This is actually efficient to do
> > > > > > with a SCMP_CMP_GT comparison added to a rule.
> > > > > > 
> > > > > > I'm wondering how this is best specified.  There are a few otions:
> > > > > > 
> > > > > > 1. if we think this is the only comparison rule we'll frequently want, we
> > > > > > could extend the policy language so that
> > > > > > 
> > > > > > 2
> > > > > > blacklist maxno 500
> > > > > > finit_module errno 1
> > > > > > 
> > > > > > Would mean that anything higher than 500 would be blacklisted.
> > > > > > 
> > > > > > 2.  We could define seccomp policy format version 3, which allows more
> > > > > > general rules, like
> > > > > > 
> > > > > > 3
> > > > > > blacklist
> > > > > > finit_module errno 1
> > > > > > GT 500 errno 1
> > > > > > LT 3 kill
> > > > > > 
> > > > > > Preferences?  Other ideas?
> > > > > 
> > > > > I'd prefer option 2 as it also allows you to set the default action.
> > > > > However, can we easily make this even more flexible by allowing ranges?
> > > > > 
> > > > > Basically supporting:
> > > > >  - GT 500 <action> (for > 500)
> > > > >  - LT 3 <action> (for < 3)
> > > > >  - RANGE 100 200 <action> (for >= 100 and <= 200)
> > > > > 
> > > > > If it's easy, it'd also be nice being able to do that using the syscall
> > > > > name rather than its number, so that you can basically say "I'm happy
> > > > > with the syscall list up until the introduction of X" and not have to
> > > > > care about the particular syscall number for each given arches.
> > > > 
> > > > Yeah, that was how I pictured it.
> > > > 
> > > > > To block anything introduced after setns:
> > > > >  - GT setns errno 1
> > > > > 
> > > > > To make all the inotify functions return silently:
> > > > >  - RANGE inotify_init inotify_rm_watch errno 0
> > > > > 
> > > > > 
> > > > > Is that reasonably easy to implement or am I dreaming? :)
> > > > 
> > > > Should be easy - the only reason I didn't add RANGE was that it didn't
> > > > really seem useful, but it should just consist of adding a few more
> > > > elements to the rule array being added.
> > > 
> > > The main use I can think for it is for cases where a bunch of syscalls
> > > are related, as is the case with those inotify ones I used as an example
> > > (3 syscalls which if you want to block should be blocked together).
> > > 
> > > This should also allow for blacklisting a bunch of newish syscalls but
> > > not the latest addition which the user actually wants to use (you'd then
> > > do a range block for those you don't want + a GT on whatever's higher
> > > than the new syscall you want).
> > 
> > Perhaps all of the LT/GT/RANGE options should only be supported for a single
> > arch at a time (i.e. not in 'all')?
> 
> Shouldn't those 3 be mostly fine when using the names rather than the numbers?

My concern is that just because syscalls 300-305 are grouped in x86-64
doesn't mean that the corresponding syscalls will be grouped in x86.
There may be an extra syscall in there.  I haven't verified that.

> I guess syscalls could be introduced in a different order on different
> arches, but the general case should still be mostly fine.

It just seems like potentially asking for trouble - but sure it's less
work on my end to assume that it's ok :)


More information about the lxc-devel mailing list