[lxc-users] Experience with large number of LXC/LXD containers
Tomasz Chmielewski
mangoo at wpkg.org
Thu Apr 6 01:15:15 UTC 2017
On 2017-03-13 06:28, Benoit GEORGELIN - Association Web4all wrote:
> Hi lxc-users ,
>
> I would like to know if you have any experience with a large number of
> LXC/LXD containers ?
> In term of performance, stability and limitation .
>
> I'm wondering for exemple, if having 100 containers behave the same of
> having 1.000 or 10.000 with the same configuration to avoid to talk
> about container usage.
I'm running LXD on several servers and I'm generally satisfied with it -
performance, stability are fine. They are mostly <50 containers though.
I also have a LXD server which runs 100+ containers, which
starts/stops/deletes dozens of containers daily and is used for
automation. Approximately once every 1-2 months, "lxc stop" / "lxc
restart" command will fail, which is a bit of stability concern for us.
The cause is unclear. In LXD log for the container, the only thing
logged is:
lxc 20170301115514.738 WARN lxc_commands -
commands.c:lxc_cmd_rsp_recv:172 - Command get_cgroup failed to receive
response: Connection reset by peer.
When it starts to happen, it affects all containers - "lxc stop / lxc
restart" will hang for any of the running containers. What's
interesting, the container gets stopped with "lxc stop", the command
just never returns. For "lxc restart" case, it will just stop the
container (and the command will not return / will not start the
container again).
The only thing which fixes that is server restart.
There is also no clear way to reproduce it reliably (other than running
the server for long, and starting/stopping a large number of containers
over that time...).
I think it's some kernel issue, but unfortunately I was not able to
debug this any further.
Tomasz Chmielewski
https://lxadm.com
More information about the lxc-users
mailing list