[lxc-devel] [PATCH] Fix race/corruption with multiple lxc-start, lxc-execute

Serge Hallyn serge.hallyn at canonical.com
Thu Dec 13 12:28:03 UTC 2012


Quoting Dwight Engen (dwight.engen at oracle.com):
> If you start more than one lxc-start/lxc-execute with the same name at the
> same time, or just do an lxc-start/lxc-execute with the name of a container
> that is already running, lxc doesn't figure out that the container with this
> name is already running until fairly late in the initialization process: ie
> when __lxc_start() -> lxc_poll() -> lxc_command_mainloop_add() attempts to
> create the same abstract socket name.
> 
> By this point a fair amount of initialization has been done that actually
> messes up the running container. For example __lxc_start() -> lxc_spawn() ->
> lxc_cgroup_create() -> lxc_one_cgroup_create() -> try_to_move_cgname() moves
> the running container's cgroup to a name of deadXXXXXX.
> 
> The solution in this patch is to use the atomic existence of the abstract
> socket name as the indicator that the container is already running.  To do
> so, I just refactored lxc_command_mainloop_add() into an lxc_command_init()
> routine that attempts to bind the socket, and ensure this is called earlier
> before much initialization has been done.
> 
> In testing, I verified that maincmd_fd was still open at the time of lxc_fini,
> so the entire lifetime of the container's run should be covered. The only
> explicit close of this fd was in the reboot case of lxcapi_start(), which is
> now moved to lxc_fini(), which I think is more appropriate.
> 
> Even though it is not checked any more, set maincmd_fd to -1 instead of 0 to
> indicate its not open since 0 could be a valid fd.
> 
> Signed-off-by: Dwight Engen <dwight.engen at oracle.com>

Acked-by: Serge E. Hallyn <serge.hallyn at ubuntu.com>




More information about the lxc-devel mailing list