[lxc-users] Experience with large number of LXC/LXD containers

Tomasz Chmielewski mangoo at wpkg.org
Thu Apr 6 01:15:15 UTC 2017


On 2017-03-13 06:28, Benoit GEORGELIN - Association Web4all wrote:
> Hi lxc-users ,
> 
> I would like to know if you have any experience with a large number of
> LXC/LXD containers ?
> In term of performance, stability and limitation .
> 
> I'm wondering for exemple, if having 100 containers behave the same of
> having 1.000 or 10.000  with the same configuration to avoid to talk
> about container usage.

I'm running LXD on several servers and I'm generally satisfied with it - 
performance, stability are fine. They are mostly <50 containers though.

I also have a LXD server which runs 100+ containers, which 
starts/stops/deletes dozens of containers daily and is used for 
automation. Approximately once every 1-2 months, "lxc stop" / "lxc 
restart" command will fail, which is a bit of stability concern for us.

The cause is unclear. In LXD log for the container, the only thing 
logged is:


lxc 20170301115514.738 WARN     lxc_commands - 
commands.c:lxc_cmd_rsp_recv:172 - Command get_cgroup failed to receive 
response: Connection reset by peer.


When it starts to happen, it affects all containers - "lxc stop / lxc 
restart" will hang for any of the running containers. What's 
interesting, the container gets stopped with "lxc stop", the command 
just never returns. For "lxc restart" case, it will just stop the 
container (and the command will not return / will not start the 
container again).

The only thing which fixes that is server restart.

There is also no clear way to reproduce it reliably (other than running 
the server for long, and starting/stopping a large number of containers 
over that time...).

I think it's some kernel issue, but unfortunately I was not able to 
debug this any further.



Tomasz Chmielewski
https://lxadm.com


More information about the lxc-users mailing list