[lxc-devel] NFS mounts inside a container uses/requires the host IPstack

Fri Mar 18 23:32:34 UTC 2011

On 03/18/2011 01:43 PM, Tim Spriggs wrote:
> On Fri, Mar 18, 2011 at 6:34 AM, Rob Landley <rlandley at parallels.com> wrote:
>> On 03/16/2011 01:51 PM, Tim Spriggs wrote:
>>> Thanks for the offer but I think I have networking under control. What
>>> is not working properly is that NFS happens from a host IP instead of
>>> a context IP... even though it is started from the context IP.
>>
>> I'm working on that.
>>
>> Here's the kernel patch needed to get the basic minimal NFSv3
>> functionality to work.  (Note the big long mount invocation switching
>> off tons of stuff.  This patch makes it work ONCE YOU'VE SWITCHED ALL
>> THAT OFF.  No portmap, no lockd, no dns resolution...)
>>
>> Also, note that if the host and container ever try to use the same IP,
>> the NFS cacheing stuff mixes stuff together and it all goes pear shaped.
>>
>> As I said: working on it...
>>
>> Rob
>>
> 
> Cool!

To clarify: was that an "it worked for me" that might translate into
some variant of acked-by?

I'm hoping to get at least this patch (or the new one I'm working on
that copies the net context into struct nfs_client instead of repeatedly
dereferencing current) upstream in 2.6.39.

> I guess the next step for me would be getting caching straight
> as I will have the same mounts on several containers. What can I do to
> help?

I have been wrestling with this issue for about 3 months, with very slow
progress.  I've blogged about it at http://landley.livejournal.com.
Step 1 is probably read my trail of tears.  (The entries taged with
"dullboy".  All work and no play...)

Unfortunately, it's not trivial.  NFS is a giant pile of premature
optimization left over from the 1980's, and now you have to cut through
those "optimizations" (which really aren't on modern hardware) to teach
it that networks have multiple contexts so things like merging
superblocks because you THINK they live on the same server is not a good
idea.  We've got to fix portmapd, mountd, nfsd, lockd, the sunrpc code
(idempotent cacheing, fun for the whole family!), the DNS lookup stuff,
nontrivial authentication mechanisms, and other stuff I haven't even
_found_ yet.

I note that all of the above is from my attempts to get NFSv2 and v3 to
work.  I haven't even touched v4 yet, which is mostly a separate
implementation and should have been in its own directory the way ext3
was.  If you wanted to bang on that, by my guest.  We really won't
conflict.  Google for Kirill Shutemov's rpc_pipefs patches, that's yet
another random piece of architecture nfsv4 uses that v3 doesn't.  As far
as I can tell, the NFSv4 developers decided that the real problem with
NFSv3 was that it wasn't complicated enough, so rather than rip out the
layers of conflicting premature optimizations from the 1980's that made
cache coherency a dirty word, they quadrupled the size of the spec.  Bravo.

As for something concrete and maybe self-contained: the kerberos
authentication infrastructure is shared between samba and NFS.  That's
nicely factored out, maybe you could start by tackling that?  This guy
may be able to help, he's got a test case for it over on the samba side
of things:

  http://comments.gmane.org/gmane.linux.kernel.containers/19784

(To be honest, I'd much rather work on factoring out the virtfs stuff
from KVM/QEMU into a standalone userspace serer and try to convince
people to switch to p9fs, but NFS is the cobol of filesystems so the
people who think Red Hat Enterprise is a good idea are never going to
abandon their sunk costs in building up a good stockholm syndrome with
it...  *shrug*  You'd think the tux/khttpd web servers would have taught
people that servers in kernel space are not a good idea, but the NFS
guys don't seem to care...)

Rob