<div dir="ltr"><p>I am running quite a few machines with LXC and systemd on Arch Linux. I noticed
one day that one of my more heavily accessed machines was very slow to
respond when I tried to log in through ssh. It sat there for a good
15-30 seconds before letting me in. No errors were presented, but upon
inspection I found that /sbin/init was consuming 100% of a cpu. We
hoped this was a fluke, and that the init was just borked, and monitored
it closely. The time came to reboot the server and when it rebooted,
it was fine, but as soon as traffic (ssh logins) started hitting it, the
/usr/lib/systemd/systemd-logind process was going crazy,pegging at 100%,
and then after a day I found it had switched to the /sbin/init taking
over as the culprit.</p><p>After investigating systemd and finding that I
can list all the units by just typing systemd at the prompt, I found
that there were upwards of 15000 scope sessions. (this only took a day or two to create this many sessions)</p><div class=""><pre><code>systemctl | grep session-c | wc -l</code></pre></div><p>The scope sessions are identified by the following lines in the systemd unit list:</p>
<div class=""><pre><code>session-c43459.scope loaded active running Session c43459 of user root
session-c43460.scope loaded active running Session c43460 of user root</code></pre></div><p>After
digging further and finding out how to investigate these sessions, it
was as I suspected, these are zombie sessions that have persisted.</p><div class=""><pre><code>[root@mybox ~]# systemctl status session-c43460.scope
session-c43460.scope - Session c43460 of user root
Loaded: loaded (/run/systemd/system/session-c43460.scope; static)
Drop-In: /run/systemd/system/session-c43460.scope.d
`-90-After-systemd-user-sessions\x2eservice.conf, 90-Description.conf, 90-KillMode.conf, 90-SendSIGHUP.conf, 90-Slice.conf, 90-TimeoutStopUSec.conf
Active: active (running) since Wed 2014-02-19 13:18:44 MST; 29min ago
Feb 19 13:18:44 mybox systemd[1]: Started Session c43460 of user root.
Feb 19 13:18:45 mybox sshd[27105]: Received disconnect from <a href="http://10.10.10.10">10.10.10.10</a>: disconnecte...user
Feb 19 13:18:45 mybox sshd[27105]: pam_unix(sshd:session): session closed for user root
Hint: Some lines were ellipsized, use -l to show in full.</code></pre></div><p>This clearly shows me that the session was started for an ssh connection, but then persisted, even after the user disconnected.<br>After
stopping the 15000+ zombie sessions, Voila! no more CPU grab by init.
The box is running significantly better, and everything is as it should
be.</p><p>I know I have a unique environment, with an application that
is connecting up to this server several times a minute just to get data,
but that's a problem for everyone. All my other LXC instances have
zombie sessions on them too, and while my connects are not nearly as
much, within a year I expect to see degraded performance. </p><p>I also had one box that systemd was crashed on, I believe it was due to this same problem. We have a backup script that runs at night and does an ssh connect to dump each table in a database, and this user had more than 4000 tables, so that happened in just a few days. We had to restart the box to get it fixed.</p>
<p>I've written a script to find and kill these zombie sessions, as can be done by issuing </p><div class=""><pre><code>systemctl stop session-c43460.scope</code></pre></div><p>, but it would be nice if these scope sessions were exiting normally as they should when an ssh disconnect occurrs</p>
<p>So, I believe this is a problem between LXC and systemd, as the hosts are perfectly fine and no zombie sessions are present.</p><p>Anyone else out there having problems with LXC and systemd? or any suggestions or solutions are welcome.</p>
</div>