[lxc-devel] versioning the container monitor api

Tue Aug 27 21:25:03 UTC 2013

I like it, thanks :)

-serge

Quoting Christian Seiler (christian at iwakd.de):
> Hi Serge,
> 
> > 	I start a container running a crucial mail server.  I upgrade
> > 	lxc.  The new lxc has changed the format of messages for the
> > 	commands api.  Now I do 'lxc-list', which queries the running
> > 	monitor to check its init pid with LXC_CMD_GET_INIT_PID.  The
> > 	container monitor crashes on bad input.
> 
> Yes, that's a problem I frequently also had.
> 
> > The lxc_af_unix_connect function could start with a handshake with a
> > version number, or we could tack a version # onto the lxc_cmd_req
> > struct.  Best would be if we agreed the client always sends its version
> > to the monitor, then vice versa, and then both sides decide whether
> > they can proceed (so both sides can log error).  We could just use
> > a monotonically increasing int, hand-inserted.  However that's subject
> > to error - if we make a change without remembering to update the version
> > number, then we could still get a crash.  We could automate this perhaps
> > by having a Makefile do some sort of check, i.e. hashing all the structs
> > which may be communicated over the socket.
> 
> I think the real solution is far easier: previously, the command
> interface changed quite a bit because it was quite a bit more limited
> than it is now. But now the basic structure of the current command
> interface seems to be rather complete. Each request is just a tuple
> (cmd, datalen, data_ptr (mostly ignored)) + possibly additional data of
> length datalen on the line afterwards. Each response is (ret, datalen,
> data_ptr (mostly ignored)) + possibly data of length datalen on the line
> afterwards. I don't see how even quite complicated stuff couldn't in
> principle fit in there. The only question is what the semantics of
> cmd/ret, datalen, data_ptr and the data itself are.
> 
> So we should just declare that for the current commands, the semantics
> are completely fixed. Meaning that LXC_CMD_CONSOLE will always have the
> same on-the-wire semantics as it currently does.
> 
> But let's suppose at some point in the future, LXC_CMD_CONSOLE is
> supposed change semantics completely. Then we change the enum to:
> 
> typedef enum {
>   LXC_CMD_DEPRECATED1,  // <- LXC_CMD_CONSOLE was here
>   ...,
>   LXC_CMD_CONSOLE,      // <- newly added, gets a new number
>   LXC_CMD_MAX,
> };
> 
> Then we can change the semantics of datalen / data_ptr and additional
> data and we will still be backwards compatible with all the other
> options. We just have to make sure that the processing routines always
> eat up all of the data, even if the command is not recognized, so that
> the connection will be in a sane state after that and communication may
> proceed.
> 
> If the server now doesn't recognize a command, it will issue the trivial
> response { -ENOSYS, 0, 0 } back to the client. Then the client will know
> that the server is too old / too new to support the command and will
> have to cope with it. In the case of something like LXC_CMD_GET_STATE
> and LXC_CMD_GET_INIT_PID one might want to write a fallback routine for
> the client, in the case of LXC_CMD_CONSOLE perhaps not, depends on why
> the change is required.
> 
> Add big fat comments in the appropriate parts of commands.h/commands.c
> to make sure that nobody changes this (+ perhaps a few unit tests) and
> there will be compatibility even between versions.
> 
> > But we might want to try and accomodate newer clients talking to
> > older versions, somehow. I suspect that'd be fragile, but it might
> > be worthwhile.
> 
> I think that's generally a good idea (for clients post 1.0; I think for
> 1.0 it's reasonable to say we do a final incompatible break) and at
> least for core functionality it should be policy that there will be
> compatibility.
> 
> Just my 2¢.
> 
> -- Christian