[Lxc-users] cannot start any more any container?!
Brian K. White
brian at aljex.com
Wed Oct 19 17:49:34 UTC 2011
On 10/19/2011 1:24 PM, Ulli Horlacher wrote:
> Besides my problem with "cannot stop/kill lxc-start" (see other mail), I
> have now an even more severe problem: I cannot start ANY container anymore!
>
> I am sure I have overlooked something, but I cannot see what. I am really
> desperate now, because this happens to my production environment!
>
> Server host is:
>
> root at vms1:/lxc# lsb_release -a; uname -a; lxc-version
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description: Ubuntu 10.04.3 LTS
> Release: 10.04
> Codename: lucid
> Linux vms1 2.6.35-30-server #60~lucid1-Ubuntu SMP Tue Sep 20 22:28:40 UTC 2011 x86_64 GNU/Linux
> lxc version: 0.7.4.1
>
> (linux-image-server-lts-backport-maverick)
>
> All my lxc files reside in /lxc :
>
> root at vms1:/lxc# l vmtest1*
> dRWX - 2011-05-17 19:47 vmtest1
> -RWT 1,127 2011-10-19 18:54 vmtest1.cfg
> -RW- 476 2011-10-19 18:54 vmtest1.fstab
>
> I boot the container with:
>
> root at vms1:/lxc# lxc-start -f /data/lxc/vmtest1.cfg -n vmtest1 -d -o /data/lxc/vmtest1.log
>
>
> But nothing happens, there is only a lxc-start process dangling around:
>
> root at vms1:/lxc# psg vmtest1
> USER PID PPID %CPU VSZ COMMAND
> root 31571 1 0.0 20872 lxc-start -f /data/lxc/vmtest1.cfg -n vmtest1 -d -o /data/lxc/vmtest1.log
>
> The logfile is empty:
>
> root at vms1:/lxc# l vmtest1.log
> -RW- 0 2011-10-19 19:09 vmtest1.log
>
>
> And no corresponding /cgroup/vmtest1 entry:
>
> root at vms1:/lxc# l /cgroup/
> dRWX - 2011-10-10 17:50 /cgroup/2004
> dRWX - 2011-10-10 17:50 /cgroup/2017
> dRWX - 2011-10-10 17:50 /cgroup/libvirt
> -RW- 0 2011-10-10 17:50 /cgroup/cgroup.event_control
> -RW- 0 2011-10-10 17:50 /cgroup/cgroup.procs
> -RW- 0 2011-10-10 17:50 /cgroup/cpu.rt_period_us
> -RW- 0 2011-10-10 17:50 /cgroup/cpu.rt_runtime_us
> -RW- 0 2011-10-10 17:50 /cgroup/cpu.shares
> -RW- 0 2011-10-10 17:50 /cgroup/cpuacct.stat
> -RW- 0 2011-10-10 17:50 /cgroup/cpuacct.usage
> -RW- 0 2011-10-10 17:50 /cgroup/cpuacct.usage_percpu
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.cpu_exclusive
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.cpus
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.mem_exclusive
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.mem_hardwall
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.memory_migrate
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.memory_pressure
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.memory_pressure_enabled
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.memory_spread_page
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.memory_spread_slab
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.mems
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.sched_load_balance
> -RW- 0 2011-10-10 17:50 /cgroup/cpuset.sched_relax_domain_level
> -RW- 0 2011-10-10 17:50 /cgroup/devices.allow
> -RW- 0 2011-10-10 17:50 /cgroup/devices.deny
> -RW- 0 2011-10-10 17:50 /cgroup/devices.list
> -RW- 0 2011-10-10 17:50 /cgroup/memory.failcnt
> -RW- 0 2011-10-10 17:50 /cgroup/memory.force_empty
> -RW- 0 2011-10-10 17:50 /cgroup/memory.limit_in_bytes
> -RW- 0 2011-10-10 17:50 /cgroup/memory.max_usage_in_bytes
> -RW- 0 2011-10-10 17:50 /cgroup/memory.memsw.failcnt
> -RW- 0 2011-10-10 17:50 /cgroup/memory.memsw.limit_in_bytes
> -RW- 0 2011-10-10 17:50 /cgroup/memory.memsw.max_usage_in_bytes
> -RW- 0 2011-10-10 17:50 /cgroup/memory.memsw.usage_in_bytes
> -RW- 0 2011-10-10 17:50 /cgroup/memory.move_charge_at_immigrate
> -RW- 0 2011-10-10 17:50 /cgroup/memory.oom_control
> -RW- 0 2011-10-10 17:50 /cgroup/memory.soft_limit_in_bytes
> -RW- 0 2011-10-10 17:50 /cgroup/memory.stat
> -RW- 0 2011-10-10 17:50 /cgroup/memory.swappiness
> -RW- 0 2011-10-10 17:50 /cgroup/memory.usage_in_bytes
> -RW- 0 2011-10-10 17:50 /cgroup/memory.use_hierarchy
> -RW- 0 2011-10-10 17:50 /cgroup/net_cls.classid
> -RW- 0 2011-10-10 17:50 /cgroup/notify_on_release
> -RW- 0 2011-10-10 17:50 /cgroup/release_agent
> -RW- 0 2011-10-10 17:50 /cgroup/tasks
>
> At last the container config file:
>
> lxc.utsname = vmtest1
> lxc.tty = 4
> lxc.pts = 1024
> lxc.network.type = veth
> lxc.network.link = br0
> lxc.network.name = eth0
> lxc.network.flags = up
> lxc.network.mtu = 1500
> lxc.network.ipv4 = 129.69.1.42/24
> lxc.rootfs = /lxc/vmtest1
> lxc.mount = /lxc/vmtest1.fstab
> # which CPUs
> lxc.cgroup.cpuset.cpus = 1,2,3
> lxc.cgroup.cpu.shares = 1024
> # http://www.mjmwired.net/kernel/Documentation/cgroups/memory.txt
> lxc.cgroup.memory.limit_in_bytes = 512M
> lxc.cgroup.memory.memsw.limit_in_bytes = 512M
> lxc.cgroup.devices.deny = a
> # /dev/null and zero
> lxc.cgroup.devices.allow = c 1:3 rwm
> lxc.cgroup.devices.allow = c 1:5 rwm
> # consoles
> lxc.cgroup.devices.allow = c 5:1 rwm
> lxc.cgroup.devices.allow = c 5:0 rwm
> lxc.cgroup.devices.allow = c 4:0 rwm
> lxc.cgroup.devices.allow = c 4:1 rwm
> # /dev/{,u}random
> lxc.cgroup.devices.allow = c 1:9 rwm
> lxc.cgroup.devices.allow = c 1:8 rwm
> lxc.cgroup.devices.allow = c 136:* rwm
> lxc.cgroup.devices.allow = c 5:2 rwm
> # rtc
> lxc.cgroup.devices.allow = c 254:0 rwm
> # restrict capabilities, see: man capabilities
> lxc.cap.drop = mac_override
> lxc.cap.drop = sys_module
> lxc.cap.drop = sys_admin
> lxc.cap.drop = sys_time
>
>
> Any hints for debugging this problem?
I haven't scrutinized your info in detail but one quick question, did
you have vsftpd running in the containers, and if so, did you have this
in vsftpd.conf inside the container?
# LXC compatibility
# http://www.mail-archive.com/lxc-users@lists.sourceforge.net/msg01110.html
isolate=NO
isolate_network=NO
For me I don't think failing to have that caused stuck lxc-start
processes like you have, but it did cause containers to be startable
only once per host boot. Once vsftpd is started inside a container
without those options, you can shut down that container ok, but you
can't ever remove the cgroup for it, and you can't ever restart the same
container using the same cgroup directory.
If your situation is dire enough, there is actually an ugly kludge you
can do to get past an emergency.
If you change the name of the container it will create a new cgroup
based on the new name. That would allow you to start it again without
rebooting the host. Not exactly elegant.
(this is all if cgroup contention is even your problem which I cannot
say at all)
--
bkw
More information about the lxc-users
mailing list