[lxc-devel] versioning the container monitor api

Dwight Engen dwight.engen at oracle.com
Tue Sep 3 15:20:20 UTC 2013


On Tue, 27 Aug 2013 22:04:18 +0200
Christian Seiler <christian at iwakd.de> wrote:

> Hi Serge,
> 
> > 	I start a container running a crucial mail server.  I
> > upgrade lxc.  The new lxc has changed the format of messages for the
> > 	commands api.  Now I do 'lxc-list', which queries the
> > running monitor to check its init pid with LXC_CMD_GET_INIT_PID.
> > The container monitor crashes on bad input.
> 
> Yes, that's a problem I frequently also had.
> 
> > The lxc_af_unix_connect function could start with a handshake with a
> > version number, or we could tack a version # onto the lxc_cmd_req
> > struct.  Best would be if we agreed the client always sends its
> > version to the monitor, then vice versa, and then both sides decide
> > whether they can proceed (so both sides can log error).  We could
> > just use a monotonically increasing int, hand-inserted.  However
> > that's subject to error - if we make a change without remembering
> > to update the version number, then we could still get a crash.  We
> > could automate this perhaps by having a Makefile do some sort of
> > check, i.e. hashing all the structs which may be communicated over
> > the socket.
> 
> I think the real solution is far easier: previously, the command
> interface changed quite a bit because it was quite a bit more limited
> than it is now. But now the basic structure of the current command
> interface seems to be rather complete. Each request is just a tuple
> (cmd, datalen, data_ptr (mostly ignored)) + possibly additional data
> of length datalen on the line afterwards. Each response is (ret,
> datalen, data_ptr (mostly ignored)) + possibly data of length datalen
> on the line afterwards. I don't see how even quite complicated stuff
> couldn't in principle fit in there. The only question is what the
> semantics of cmd/ret, datalen, data_ptr and the data itself are.
> 
> So we should just declare that for the current commands, the semantics
> are completely fixed. Meaning that LXC_CMD_CONSOLE will always have
> the same on-the-wire semantics as it currently does.
> 
> But let's suppose at some point in the future, LXC_CMD_CONSOLE is
> supposed change semantics completely. Then we change the enum to:
> 
> typedef enum {
>   LXC_CMD_DEPRECATED1,  // <- LXC_CMD_CONSOLE was here
>   ...,
>   LXC_CMD_CONSOLE,      // <- newly added, gets a new number
>   LXC_CMD_MAX,
> };
> 
> Then we can change the semantics of datalen / data_ptr and additional
> data and we will still be backwards compatible with all the other
> options. We just have to make sure that the processing routines always
> eat up all of the data, even if the command is not recognized, so that
> the connection will be in a sane state after that and communication
> may proceed.
> 
> If the server now doesn't recognize a command, it will issue the
> trivial response { -ENOSYS, 0, 0 } back to the client. Then the
> client will know that the server is too old / too new to support the
> command and will have to cope with it. In the case of something like
> LXC_CMD_GET_STATE and LXC_CMD_GET_INIT_PID one might want to write a
> fallback routine for the client, in the case of LXC_CMD_CONSOLE
> perhaps not, depends on why the change is required.
> 
> Add big fat comments in the appropriate parts of commands.h/commands.c
> to make sure that nobody changes this (+ perhaps a few unit tests) and
> there will be compatibility even between versions.
> 
> > But we might want to try and accomodate newer clients talking to
> > older versions, somehow. I suspect that'd be fragile, but it might
> > be worthwhile.
> 
> I think that's generally a good idea (for clients post 1.0; I think
> for 1.0 it's reasonable to say we do a final incompatible break) and
> at least for core functionality it should be policy that there will be
> compatibility.
> 
> Just my 2¢.

Yep I think all that works, good idea!

For backwards compatibility, I think we should expect that when someone
implements the first *_V2 command, if it gets a -ENOSYS response from
the monitor it should then issue the original *_V1 command and handle
the *_V1 response. This should allow us to be gracefully backwards
compatible, and the lxc-commands shouldn't need to change. I think this
can mostly be done in the lxc_cmd_*() implementations and thus can be
hidden from lxc core having to worry about it.

It wasn't specifically addressed above, but I don't think we should try
to support "forward compatible", ie. newer monitors should not have to
respond (other than -ENOSYS) to V1 commands that might come an older
lxc client (weird scenario anyway).




More information about the lxc-devel mailing list