[lxc-users] zfs disk usage for published lxd images

Mon May 16 08:38:34 UTC 2016

I'm using lxd under ubuntu 16.04 with zfs.

I want to use an existing container snapshot as a cloning base to create 
other containers. As far as I can see this is done via "lxc publish", 
although I can't find much documentation on this apart from this blog post:

https://insights.ubuntu.com/2015/06/30/publishing-lxd-images/

My question is around zfs disk space usage. I was hoping that the 
publish operation would simply take a snapshot of the existing container 
and therefore use no more local disk space, but in fact it seems to use 
the whole amount of disk space again. Let me demonstrate. First the 
clean system:

root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G  95.5K  77.0G         -     0%     0%  1.00x ONLINE  -

Now I create a container, then a couple more, from the same image:

root at vtp:~# lxc launch ubuntu:16.04 base1
Creating base1
Retrieving image: 100%
Starting base1
root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G   644M  76.4G         -     0%     0%  1.00x ONLINE  -
root at vtp:~# lxc launch ubuntu:16.04 base2
Creating base2
Starting base2
root at vtp:~# lxc launch ubuntu:16.04 base3
Creating base3
Starting base3
root at vtp:~# lxc exec base1 /bin/sh -- -c 'echo hello >/usr/test.txt'
root at vtp:~# lxc stop base1
root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G   655M  76.4G         -     0%     0%  1.00x ONLINE  -

So disk space usage is about 645MB for the image, and small change for 
the instances launched from it. Now I want to clone further containers 
from base1, so I publish it:

root at vtp:~# time lxc publish base1 --alias clonemaster
Container published with fingerprint: 
80ec0105da9d1f8f173e45233921bc772319e39364c322786a5b4cfec895cb68

real    0m45.155s
user    0m0.000s
sys     0m0.012s
root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G  1.27G  75.7G         -     0%     1%  1.00x ONLINE  -
root at vtp:~# zfs list -t snapshot
NAME USED  AVAIL  REFER  MOUNTPOINT
lxd/images/80ec0105da9d1f8f173e45233921bc772319e39364c322786a5b4cfec895cb68 at readonly 
0      -   638M  -
lxd/images/f4c4c60a6b752a381288ae72a1689a9da00f8e03b732c8d1b8a8fcd1a8890800 at readonly 
0      -   638M  -

I notice that (a) publish is a slow process, and (b) disk usage has 
doubled. Finally launch a container:

root at vtp:~# lxc launch clonemaster myclone
Creating myclone
Starting myclone
root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G  1.27G  75.7G         -     0%     1%  1.00x ONLINE  -

That's fine - it's sharing with the image as expected.

Now, what I was hoping for was that the named image (clonemaster) would 
be a snapshot derived directly from the parent, so that it would also 
share disk space. What I'm actually trying to achieve is a workflow like 
this:

- launch (say) 10 initial master containers
- customise those 10 containers in different ways (e.g. install 
different software packages in each one)
- launch multiple instances from each of those master containers

This is for a training lab. The whole lot will then be packaged up and 
distributed as a single VM. It would be hugely helpful if the initial 
zfs usage came to around 650MB not 6.5GB.

The only documentation I can find about images is here:
https://github.com/lxc/lxd/blob/master/doc/image-handling.md

It talks about the tarball image format: is it perhaps the case that 
"lxc publish" is creating a tarball, and then untarring it into a fresh 
snapshot? Is that tarball actually stored anywhere? If so, I can't find 
it. Or is the tarball created dynamically when you do "lxc image copy" 
to a remote? If so, why not just use a zfs snapshot for "lxc publish"?

<<Digs around>> Maybe it's done this way because "A dataset cannot be 
destroyed if snapshots of the dataset exist 
<http://docs.oracle.com/cd/E19253-01/819-5461/6n7ht6r4f/>": i.e. using a 
snapshot for publish would prevent the original container being deleted. 
That makes sense - although I suppose it could have its contents rm 
-rf'd and then renamed 
<http://docs.oracle.com/cd/E19253-01/819-5461/gamnn/index.html> to a 
graveyard name.

The other option I can think of is zfs dedupe. The finished target 
system won't have the resources to do dedupe continuously. However I 
could turn on dedupe during the cloning, do the cloning, and then turn 
it back off again (*)

Have I understood this correctly? Any additional clues gratefully received.

Thanks,

Brian Candler.

(*) P.S. I did a quick test of this. It looks like this doesn't work to 
deduplicate against any pre-existing files:

root at vtp:~# zfs set dedup=on lxd
root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G  1.27G  75.7G         -     0%     1%  1.00x ONLINE  -
root at vtp:~# lxc exec base2 /bin/sh -- -c 'echo world >/usr/test.txt'
root at vtp:~# lxc stop base2
root at vtp:~# lxc publish base2 --alias clonemaster2
Container published with fingerprint: 
8a288bd1364d82d4d8afb23aee67fa13586699c539fad94e7946f60372767150
root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G  1.88G  75.1G         -     1%     2%  1.05x ONLINE  -

But then I rebooted, and published another image:

root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G  1.87G  75.1G         -     1%     2%  1.05x ONLINE  -
root at vtp:~# lxc exec base3 /bin/sh -- -c 'echo world2 >/usr/test.txt'
root at vtp:~# lxc stop base3
root at vtp:~# time lxc publish base3 --alias clonemaster3
Container published with fingerprint: 
6abbeb5df75989944a533fdbb1d8ab94be4d18cccf20b320c009dd8aef4fb65b

real    0m55.338s
user    0m0.008s
sys     0m0.008s
root at vtp:~# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP HEALTH  ALTROOT
lxd     77G  1.88G  75.1G         -     1%     2%  2.11x ONLINE  -

So I suspect it would have all worked if I'd turned on dedupe before the 
very first image was fetched.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20160516/6b93bfd1/attachment.html>