[lxc-devel] How does the console work in most recent release?

Wed Jan 5 14:13:29 UTC 2011

On 01/05/2011 12:34 PM, Rob Landley wrote:
> On 01/05/2011 03:37 AM, Daniel Lezcano wrote:
>> On 01/05/2011 08:53 AM, Rob Landley wrote:
>>> On 01/04/2011 06:52 AM, Daniel Lezcano wrote:
>>>> On 01/04/2011 09:36 AM, Rob Landley wrote:
>>>>> I'm attempting to write a simple HOWTO for setting up a container 
>>>>> with
>>>>> LXC. Unfortunately, console handling is really really brittle and the
>>>>> only way I've gotten it to work is kind of unpleasant to document.
>>>>>
>>>>> Using lxc 0.7.3 (both in debian sid and built from source myself), I
>>>>> can lxc-create a container, and when I run lxc-start it launches init
>>>>> in the container. But the console is screwy.
>>>>>
>>>>> If my init program is just a command shell, the first key I type will
>>>>> crash lxc-start with an I/O error. (Wrapping said shell with a script
>>>>> to redirect stdin/stdout/stderr to various /dev character devices
>>>>> doesn't seem to improve matters.)
>>>>>
>>>>> Using the busybox template and the busybox-i686 binary off of
>>>>> busybox.net, it runs init and connects to the various tty devices, 
>>>>> and
>>>>> this somehow prevents lxc-start from crashing. But if I "press enter
>>>>> to active this console" like it says, the resulting shell prompt is
>>>>> completely unusable. If I'm running from an actual TTY device, then
>>>>> some of the keys I type go to the container and some don't. If my
>>>>> console is connected to a PTY when I run lxc-start (such as if I ssh
>>>>> in and run lxc-start from the ssh session), _none_ of the 
>>>>> characters I
>>>>> type go to the shell prompt.
>>>>>
>>>>> To get a usable shell prompt in the container, what I have to do is
>>>>> lxc-start in one window, ssh into the server to get a fresh terminal,
>>>>> and then run lxc-console in that second terminal. That's the only
>>>>> magic sequence I've found so far that works.
>>>>
>>>> Hmm, right. I was able to reproduce the problem.
>>>
>>> I've got two more. (Here's another half-finished documentation file,
>>> attached, which may help with the reproduction sequence.)
>>>
>>> I'm running a KVM instance to host the containers, and I've fed it an
>>> e1000 interface as eth0 with the normal -net user, and a tun/tap
>>> device on eth1 with 192.168.254.1 associated at the other end.
>>>
>>> Inside KVM, I'm using this config to set up a container:
>>>
>>> lxc.utsname = busybox
>>> lxc.network.type = phys
>>> lxc.network.flags = up
>>> lxc.network.link = eth1
>>> #lxc.network.name = eth0
>>>
>>> And going:
>>>
>>> lxc-start -n busybox -f busybox.conf -t busybox
>>>
>>> Using that (last line of the config intentionally commented out for
>>> the moment) I get an eth1 in the container that is indeed the eth1 on
>>> the host system (which is a tun/tap device I fed to kvm as a second
>>> e1000 device). That's the non-bug behavior.
>>>
>>> Bug #1: If I exit that container, eth1 vanishes from the world. The
>>> container's gone, but it doesn't reappear on the host. (This may be
>>> related to the fact that the only way I've found to kill a container
>>> is do "killall -9 lxc-start". For some reason a normal kill of
>>> lxc-start is ignored. However, this still shouldn't leak kernel
>>> resources like that.)
>>
>> It is related to the kernel behavior : netdev with a rtnl_link_ops will
>> be automatically deleted when a network namespace is destroyed. The full
>> answer is at net/core/dev.c :
>
> Um, default_device_exit_batch() maybe?  (6000 line file there...)

It's the default_device_exit function where a physical driver without 
the rtnl_link_ops is moved back to the init network namespace.

...

                 /* Leave virtual devices for the generic cleanup */
                 if (dev->rtnl_link_ops)
                         continue;

                 /* Push remaing network devices to init_net */
                 snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex);
                 err = dev_change_net_namespace(dev, &init_net, fb_name);
                 if (err) {
                         printk(KERN_EMERG "%s: failed to move %s to 
init_net: %d\n",
                                 __func__, dev->name, err);
                         BUG();
                 }

...

So in the case of tun/tap, there are rtnl_link_ops, so there are not 
moved to the init_net but removed via dellink or unregister_netdevice in 
the default_device_exit function.

>
> Unfortunately I can't rmmod a statically linked driver, and if you've 
> got two e1000 devices in the system and are still _using_ one in the 
> host that's the wrong granularity level to re-probe at anyway.
>
> If lxc-start could be killed by something other than -9 it could move 
> the device back to the host context on the way out.

Actually, it is the child of lxc-start which is killed. lxc-start just 
'waitpid' this child. As the destruction of the netdev is done by the 
kernel it will be impossible to catch the network namespace destruction 
to move back the netdev.

> (Although really, the kernel should either retain the interface or 
> provide a way to re-probe it.  As it is, in my setup I can't figure 
> out how to relaunch a container using a physical network device 
> without rebooting the host.)

Hmm, I doubt the e1000 driver is unregistered I did some tests and it 
returns to the init_net.
Maybe a dumb question did you checked with "ifconfig -a" or "ip link" ?

> Is there a todo list for LXC?  The lxc-development page doesn't link 
> to a bugzilla...
>
>>> Bug #2: When I uncomment that last line of the above busybox.conf,
>>> telling it to move eth1 into the container but call it "eth0" in
>>> there, suddenly the eth0 in the container gets entangled with the eth0
>>> on the host, to the point where dhcp gives it an address. (Which is
>>> 10.0.2.16. So it's talking to the VPN that only the host's eth0 should
>>> have access to, but it's using a different mac address. Oddly, the
>>> host eth0 still seems to work fine, and the two IP addresses can ping
>>> each other across the container interface.)
>>>
>>> This is still using the most recent release version.
>>
>> What is the kernel version ?
>
> 2.6.37-rc8, vanilla linus tree.  (I applied some NFS test patches, but 
> haven't mounted NFS this boot so it shouldn't apply.)
>
> Attached is an updated version of my first documentation file that 
> includes the kernel configuration info in step 2.
>
> Rob