[lxc-users] Lxc and systemd session.scope does not terminate
Oktober Moon
oktobermoon at gmail.com
Fri Feb 28 16:28:03 UTC 2014
I am running quite a few machines with LXC and systemd on Arch Linux. I
noticed one day that one of my more heavily accessed machines was very slow
to respond when I tried to log in through ssh. It sat there for a good
15-30 seconds before letting me in. No errors were presented, but upon
inspection I found that /sbin/init was consuming 100% of a cpu. We hoped
this was a fluke, and that the init was just borked, and monitored it
closely. The time came to reboot the server and when it rebooted, it was
fine, but as soon as traffic (ssh logins) started hitting it, the
/usr/lib/systemd/systemd-logind process was going crazy,pegging at 100%,
and then after a day I found it had switched to the /sbin/init taking over
as the culprit.
After investigating systemd and finding that I can list all the units by
just typing systemd at the prompt, I found that there were upwards of 15000
scope sessions. (this only took a day or two to create this many sessions)
systemctl | grep session-c | wc -l
The scope sessions are identified by the following lines in the systemd
unit list:
session-c43459.scope loaded active running Session
c43459 of user root
session-c43460.scope loaded active running Session
c43460 of user root
After digging further and finding out how to investigate these sessions, it
was as I suspected, these are zombie sessions that have persisted.
[root at mybox ~]# systemctl status session-c43460.scope
session-c43460.scope - Session c43460 of user root
Loaded: loaded (/run/systemd/system/session-c43460.scope; static)
Drop-In: /run/systemd/system/session-c43460.scope.d
`-90-After-systemd-user-sessions\x2eservice.conf,
90-Description.conf, 90-KillMode.conf, 90-SendSIGHUP.conf,
90-Slice.conf, 90-TimeoutStopUSec.conf
Active: active (running) since Wed 2014-02-19 13:18:44 MST; 29min ago
Feb 19 13:18:44 mybox systemd[1]: Started Session c43460 of user root.
Feb 19 13:18:45 mybox sshd[27105]: Received disconnect from
10.10.10.10: disconnecte...user
Feb 19 13:18:45 mybox sshd[27105]: pam_unix(sshd:session): session
closed for user root
Hint: Some lines were ellipsized, use -l to show in full.
This clearly shows me that the session was started for an ssh connection,
but then persisted, even after the user disconnected.
After stopping the 15000+ zombie sessions, Voila! no more CPU grab by
init. The box is running significantly better, and everything is as it
should be.
I know I have a unique environment, with an application that is connecting
up to this server several times a minute just to get data, but that's a
problem for everyone. All my other LXC instances have zombie sessions on
them too, and while my connects are not nearly as much, within a year I
expect to see degraded performance.
I also had one box that systemd was crashed on, I believe it was due to
this same problem. We have a backup script that runs at night and does an
ssh connect to dump each table in a database, and this user had more than
4000 tables, so that happened in just a few days. We had to restart the
box to get it fixed.
I've written a script to find and kill these zombie sessions, as can be
done by issuing
systemctl stop session-c43460.scope
, but it would be nice if these scope sessions were exiting normally as
they should when an ssh disconnect occurrs
So, I believe this is a problem between LXC and systemd, as the hosts are
perfectly fine and no zombie sessions are present.
Anyone else out there having problems with LXC and systemd? or any
suggestions or solutions are welcome.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20140228/04fc58f8/attachment.html>
More information about the lxc-users
mailing list