[Lxc-users] cannot start any more any container?!

Wed Oct 19 17:49:34 UTC 2011

On 10/19/2011 1:24 PM, Ulli Horlacher wrote:
> Besides my problem with "cannot stop/kill lxc-start" (see other mail), I
> have now an even more severe problem: I cannot start ANY container anymore!
>
> I am sure I have overlooked something, but I cannot see what. I am really
> desperate now, because this happens to my production environment!
>
> Server host is:
>
> root at vms1:/lxc# lsb_release -a; uname -a; lxc-version
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:    Ubuntu 10.04.3 LTS
> Release:        10.04
> Codename:       lucid
> Linux vms1 2.6.35-30-server #60~lucid1-Ubuntu SMP Tue Sep 20 22:28:40 UTC 2011 x86_64 GNU/Linux
> lxc version: 0.7.4.1
>
> (linux-image-server-lts-backport-maverick)
>
> All my lxc files reside in /lxc :
>
> root at vms1:/lxc# l vmtest1*
> dRWX               - 2011-05-17 19:47 vmtest1
> -RWT           1,127 2011-10-19 18:54 vmtest1.cfg
> -RW-             476 2011-10-19 18:54 vmtest1.fstab
>
> I boot the container with:
>
> root at vms1:/lxc# lxc-start -f /data/lxc/vmtest1.cfg -n vmtest1 -d -o /data/lxc/vmtest1.log
>
>
> But nothing happens, there is only a lxc-start process dangling around:
>
> root at vms1:/lxc# psg vmtest1
> USER       PID  PPID %CPU    VSZ COMMAND
> root     31571     1  0.0  20872 lxc-start -f /data/lxc/vmtest1.cfg -n vmtest1 -d -o /data/lxc/vmtest1.log
>
> The logfile is empty:
>
> root at vms1:/lxc# l vmtest1.log
> -RW-               0 2011-10-19 19:09 vmtest1.log
>
>
> And no corresponding /cgroup/vmtest1 entry:
>
> root at vms1:/lxc# l /cgroup/
> dRWX               - 2011-10-10 17:50 /cgroup/2004
> dRWX               - 2011-10-10 17:50 /cgroup/2017
> dRWX               - 2011-10-10 17:50 /cgroup/libvirt
> -RW-               0 2011-10-10 17:50 /cgroup/cgroup.event_control
> -RW-               0 2011-10-10 17:50 /cgroup/cgroup.procs
> -RW-               0 2011-10-10 17:50 /cgroup/cpu.rt_period_us
> -RW-               0 2011-10-10 17:50 /cgroup/cpu.rt_runtime_us
> -RW-               0 2011-10-10 17:50 /cgroup/cpu.shares
> -RW-               0 2011-10-10 17:50 /cgroup/cpuacct.stat
> -RW-               0 2011-10-10 17:50 /cgroup/cpuacct.usage
> -RW-               0 2011-10-10 17:50 /cgroup/cpuacct.usage_percpu
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.cpu_exclusive
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.cpus
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.mem_exclusive
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.mem_hardwall
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.memory_migrate
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.memory_pressure
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.memory_pressure_enabled
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.memory_spread_page
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.memory_spread_slab
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.mems
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.sched_load_balance
> -RW-               0 2011-10-10 17:50 /cgroup/cpuset.sched_relax_domain_level
> -RW-               0 2011-10-10 17:50 /cgroup/devices.allow
> -RW-               0 2011-10-10 17:50 /cgroup/devices.deny
> -RW-               0 2011-10-10 17:50 /cgroup/devices.list
> -RW-               0 2011-10-10 17:50 /cgroup/memory.failcnt
> -RW-               0 2011-10-10 17:50 /cgroup/memory.force_empty
> -RW-               0 2011-10-10 17:50 /cgroup/memory.limit_in_bytes
> -RW-               0 2011-10-10 17:50 /cgroup/memory.max_usage_in_bytes
> -RW-               0 2011-10-10 17:50 /cgroup/memory.memsw.failcnt
> -RW-               0 2011-10-10 17:50 /cgroup/memory.memsw.limit_in_bytes
> -RW-               0 2011-10-10 17:50 /cgroup/memory.memsw.max_usage_in_bytes
> -RW-               0 2011-10-10 17:50 /cgroup/memory.memsw.usage_in_bytes
> -RW-               0 2011-10-10 17:50 /cgroup/memory.move_charge_at_immigrate
> -RW-               0 2011-10-10 17:50 /cgroup/memory.oom_control
> -RW-               0 2011-10-10 17:50 /cgroup/memory.soft_limit_in_bytes
> -RW-               0 2011-10-10 17:50 /cgroup/memory.stat
> -RW-               0 2011-10-10 17:50 /cgroup/memory.swappiness
> -RW-               0 2011-10-10 17:50 /cgroup/memory.usage_in_bytes
> -RW-               0 2011-10-10 17:50 /cgroup/memory.use_hierarchy
> -RW-               0 2011-10-10 17:50 /cgroup/net_cls.classid
> -RW-               0 2011-10-10 17:50 /cgroup/notify_on_release
> -RW-               0 2011-10-10 17:50 /cgroup/release_agent
> -RW-               0 2011-10-10 17:50 /cgroup/tasks
>
> At last the container config file:
>
> lxc.utsname = vmtest1
> lxc.tty = 4
> lxc.pts = 1024
> lxc.network.type = veth
> lxc.network.link = br0
> lxc.network.name = eth0
> lxc.network.flags = up
> lxc.network.mtu = 1500
> lxc.network.ipv4 = 129.69.1.42/24
> lxc.rootfs = /lxc/vmtest1
> lxc.mount = /lxc/vmtest1.fstab
> # which CPUs
> lxc.cgroup.cpuset.cpus = 1,2,3
> lxc.cgroup.cpu.shares = 1024
> # http://www.mjmwired.net/kernel/Documentation/cgroups/memory.txt
> lxc.cgroup.memory.limit_in_bytes = 512M
> lxc.cgroup.memory.memsw.limit_in_bytes = 512M
> lxc.cgroup.devices.deny = a
> # /dev/null and zero
> lxc.cgroup.devices.allow = c 1:3 rwm
> lxc.cgroup.devices.allow = c 1:5 rwm
> # consoles
> lxc.cgroup.devices.allow = c 5:1 rwm
> lxc.cgroup.devices.allow = c 5:0 rwm
> lxc.cgroup.devices.allow = c 4:0 rwm
> lxc.cgroup.devices.allow = c 4:1 rwm
> # /dev/{,u}random
> lxc.cgroup.devices.allow = c 1:9 rwm
> lxc.cgroup.devices.allow = c 1:8 rwm
> lxc.cgroup.devices.allow = c 136:* rwm
> lxc.cgroup.devices.allow = c 5:2 rwm
> # rtc
> lxc.cgroup.devices.allow = c 254:0 rwm
> # restrict capabilities, see: man capabilities
> lxc.cap.drop = mac_override
> lxc.cap.drop = sys_module
> lxc.cap.drop = sys_admin
> lxc.cap.drop = sys_time
>
>
> Any hints for debugging this problem?

I haven't scrutinized your info in detail but one quick question, did 
you have vsftpd running in the containers, and if so, did you have this 
in vsftpd.conf inside the container?

# LXC compatibility
# http://www.mail-archive.com/lxc-users@lists.sourceforge.net/msg01110.html
isolate=NO
isolate_network=NO

For me I don't think failing to have that caused stuck lxc-start 
processes like you have, but it did cause containers to be startable 
only once per host boot. Once vsftpd is started inside a container 
without those options, you can shut down that container ok, but you 
can't ever remove the cgroup for it, and you can't ever restart the same 
container using the same cgroup directory.

If your situation is dire enough, there is actually an ugly kludge you 
can do to get past an emergency.

If you change the name of the container it will create a new cgroup 
based on the new name. That would allow you to start it again without 
rebooting the host. Not exactly elegant.

(this is all if cgroup contention is even your problem which I cannot 
say at all)

-- 
bkw