[lxc-devel] seccomp maxnr option?

Serge Hallyn serge.hallyn at ubuntu.com
Tue Jun 24 15:16:37 UTC 2014


Quoting Stéphane Graber (stgraber at ubuntu.com):
> On Tue, Jun 24, 2014 at 03:03:18PM +0000, Serge Hallyn wrote:
> > Quoting Stéphane Graber (stgraber at ubuntu.com):
> > > On Tue, Jun 24, 2014 at 02:23:33PM +0000, Serge Hallyn wrote:
> > > > Hi,
> > > > 
> > > > Not too long ago we introduced the v2 seccomp policy format, which allows
> > > > for blacklists.  One problem with blacklists is that on a newer kernel there
> > > > may be new syscalls which shouldn't be trusted.
> > > > 
> > > > So I'd like to introduce a max-syscall-number option, so that any higher
> > > > syscall number will be also blacklisted.  This is actually efficient to do
> > > > with a SCMP_CMP_GT comparison added to a rule.
> > > > 
> > > > I'm wondering how this is best specified.  There are a few otions:
> > > > 
> > > > 1. if we think this is the only comparison rule we'll frequently want, we
> > > > could extend the policy language so that
> > > > 
> > > > 2
> > > > blacklist maxno 500
> > > > finit_module errno 1
> > > > 
> > > > Would mean that anything higher than 500 would be blacklisted.
> > > > 
> > > > 2.  We could define seccomp policy format version 3, which allows more
> > > > general rules, like
> > > > 
> > > > 3
> > > > blacklist
> > > > finit_module errno 1
> > > > GT 500 errno 1
> > > > LT 3 kill
> > > > 
> > > > Preferences?  Other ideas?
> > > 
> > > I'd prefer option 2 as it also allows you to set the default action.
> > > However, can we easily make this even more flexible by allowing ranges?
> > > 
> > > Basically supporting:
> > >  - GT 500 <action> (for > 500)
> > >  - LT 3 <action> (for < 3)
> > >  - RANGE 100 200 <action> (for >= 100 and <= 200)
> > > 
> > > If it's easy, it'd also be nice being able to do that using the syscall
> > > name rather than its number, so that you can basically say "I'm happy
> > > with the syscall list up until the introduction of X" and not have to
> > > care about the particular syscall number for each given arches.
> > 
> > Yeah, that was how I pictured it.
> > 
> > > To block anything introduced after setns:
> > >  - GT setns errno 1
> > > 
> > > To make all the inotify functions return silently:
> > >  - RANGE inotify_init inotify_rm_watch errno 0
> > > 
> > > 
> > > Is that reasonably easy to implement or am I dreaming? :)
> > 
> > Should be easy - the only reason I didn't add RANGE was that it didn't
> > really seem useful, but it should just consist of adding a few more
> > elements to the rule array being added.
> 
> The main use I can think for it is for cases where a bunch of syscalls
> are related, as is the case with those inotify ones I used as an example
> (3 syscalls which if you want to block should be blocked together).
> 
> This should also allow for blacklisting a bunch of newish syscalls but
> not the latest addition which the user actually wants to use (you'd then
> do a range block for those you don't want + a GT on whatever's higher
> than the new syscall you want).

Perhaps all of the LT/GT/RANGE options should only be supported for a single
arch at a time (i.e. not in 'all')?

Is there anything we can do to make choosing syscalls and ranges easier for the
user, besides having good defaults and pointing them to e.g.
/usr/include/x86_64-linux-gnu/asm/unistd_64.h ?

-serge


More information about the lxc-devel mailing list