[lxc-devel] [PATCH 2/2] templates: use hardlink detection in rsync

Mon Dec 3 18:09:51 UTC 2012

On Mon, 2012-12-03 at 12:40 -0500, Dwight Engen wrote:
> On Mon, 3 Dec 2012 10:47:26 -0600
> Serge Hallyn <serge.hallyn at canonical.com> wrote:
> 
> > Quoting Dwight Engen (dwight.engen at oracle.com):
> > > On Mon, 3 Dec 2012 10:04:13 -0600
> > > Serge Hallyn <serge.hallyn at canonical.com> wrote:
> > > 
> > > > However one question is:  is -H ubiquitous?
> > > 
> > > I'm wondering why we don't just use the cp -a? It seems like cp is
> > > far more likely to be installed than rsync? rootfs_path probably
> > > doesn't already exist so it not like rsync is going to be faster?
> > 
> > The one advantage to me was that 'rsync -va /x/ /y' does the right
> > thing whether or not /y already exists or not.  cp -a does not.  This
> > just left the code tidier.
> > 
> > Is there a nicer clean one-line idiom to do that with cp?

> I think cp -aT does what we want. You might want to add -u also.

Are you sure?  Very sure?

AFAIK, "cp -a" will not create hard links and that was the objective
with the -H option to rsync.  I use to use (I'm very old school) the
command "cpio -p" for somethings where I wanted to really be sure things
where copied and linked properly.  Maybe "cp -a" will do that but I
would test it carefully to be sure there are no unexpected surprises.
That "-a" option has not always been there either (I go way back to SysV
and BSD days).  Reading the man page, it's really kind of confusing what
it does in the case of linked targets from linked sources and
undesirable in some cases.  Even -l is somewhat ambiguous there as to
what it does at the target with co-hard-linked files.

The objective here was to avoid having hard linked files in the source
being copied to individual files in the target, such as the links to
busy box, resulting in explosions of disk space in copying.

To answer the earlier questions, yes -H should be ubiquitous.  That's
been well documented from some of the earliest days of rsync.  I can
double check with Tridge (who is a friend and it was his baby as part of
his doctoral thesis - he's not on-line at the moment) but I'm pretty
sure that was in there from very early on.  I typically use -avAHX and
(on failure of the -X extended attributes) fall back to -avAH for
confirmation.

The reason for the -H option, as stated in the rsync documentation is
that the whole issue of tracking device:inode information between target
and source is very costly for rsync and can significantly increase the
transfer / transaction time, especially for large file systems with lots
of small files with very few hardlinks.  Therefore, the hardlink case is
turned off by default (since most files do not have hard links to them
but all of them have to be tracked in the bookkeeping if it's enabled).
How does "cp -a" avoid this?  Does it handle that case more efficiently
that rsync?

Regards,
Mike
-- 
Michael H. Warfield (AI4NB) | (770) 985-6132 |  mhw at WittsEnd.com
   /\/\|=mhw=|\/\/          | (678) 463-0932 |  http://www.wittsend.com/mhw/
   NIC whois: MHW9          | An optimist believes we live in the best of all
 PGP Key: 0x674627FF        | possible worlds.  A pessimist is sure of it!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: This is a digitally signed message part
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20121203/773f5de2/attachment.pgp>