[lxc-users] zfs disk usage for published lxd images
Brian Candler
b.candler at pobox.com
Mon May 16 08:38:34 UTC 2016
I'm using lxd under ubuntu 16.04 with zfs.
I want to use an existing container snapshot as a cloning base to create
other containers. As far as I can see this is done via "lxc publish",
although I can't find much documentation on this apart from this blog post:
https://insights.ubuntu.com/2015/06/30/publishing-lxd-images/
My question is around zfs disk space usage. I was hoping that the
publish operation would simply take a snapshot of the existing container
and therefore use no more local disk space, but in fact it seems to use
the whole amount of disk space again. Let me demonstrate. First the
clean system:
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 95.5K 77.0G - 0% 0% 1.00x ONLINE -
Now I create a container, then a couple more, from the same image:
root at vtp:~# lxc launch ubuntu:16.04 base1
Creating base1
Retrieving image: 100%
Starting base1
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 644M 76.4G - 0% 0% 1.00x ONLINE -
root at vtp:~# lxc launch ubuntu:16.04 base2
Creating base2
Starting base2
root at vtp:~# lxc launch ubuntu:16.04 base3
Creating base3
Starting base3
root at vtp:~# lxc exec base1 /bin/sh -- -c 'echo hello >/usr/test.txt'
root at vtp:~# lxc stop base1
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 655M 76.4G - 0% 0% 1.00x ONLINE -
So disk space usage is about 645MB for the image, and small change for
the instances launched from it. Now I want to clone further containers
from base1, so I publish it:
root at vtp:~# time lxc publish base1 --alias clonemaster
Container published with fingerprint:
80ec0105da9d1f8f173e45233921bc772319e39364c322786a5b4cfec895cb68
real 0m45.155s
user 0m0.000s
sys 0m0.012s
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.27G 75.7G - 0% 1% 1.00x ONLINE -
root at vtp:~# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
lxd/images/80ec0105da9d1f8f173e45233921bc772319e39364c322786a5b4cfec895cb68 at readonly
0 - 638M -
lxd/images/f4c4c60a6b752a381288ae72a1689a9da00f8e03b732c8d1b8a8fcd1a8890800 at readonly
0 - 638M -
I notice that (a) publish is a slow process, and (b) disk usage has
doubled. Finally launch a container:
root at vtp:~# lxc launch clonemaster myclone
Creating myclone
Starting myclone
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.27G 75.7G - 0% 1% 1.00x ONLINE -
That's fine - it's sharing with the image as expected.
Now, what I was hoping for was that the named image (clonemaster) would
be a snapshot derived directly from the parent, so that it would also
share disk space. What I'm actually trying to achieve is a workflow like
this:
- launch (say) 10 initial master containers
- customise those 10 containers in different ways (e.g. install
different software packages in each one)
- launch multiple instances from each of those master containers
This is for a training lab. The whole lot will then be packaged up and
distributed as a single VM. It would be hugely helpful if the initial
zfs usage came to around 650MB not 6.5GB.
The only documentation I can find about images is here:
https://github.com/lxc/lxd/blob/master/doc/image-handling.md
It talks about the tarball image format: is it perhaps the case that
"lxc publish" is creating a tarball, and then untarring it into a fresh
snapshot? Is that tarball actually stored anywhere? If so, I can't find
it. Or is the tarball created dynamically when you do "lxc image copy"
to a remote? If so, why not just use a zfs snapshot for "lxc publish"?
<<Digs around>> Maybe it's done this way because "A dataset cannot be
destroyed if snapshots of the dataset exist
<http://docs.oracle.com/cd/E19253-01/819-5461/6n7ht6r4f/>": i.e. using a
snapshot for publish would prevent the original container being deleted.
That makes sense - although I suppose it could have its contents rm
-rf'd and then renamed
<http://docs.oracle.com/cd/E19253-01/819-5461/gamnn/index.html> to a
graveyard name.
The other option I can think of is zfs dedupe. The finished target
system won't have the resources to do dedupe continuously. However I
could turn on dedupe during the cloning, do the cloning, and then turn
it back off again (*)
Have I understood this correctly? Any additional clues gratefully received.
Thanks,
Brian Candler.
(*) P.S. I did a quick test of this. It looks like this doesn't work to
deduplicate against any pre-existing files:
root at vtp:~# zfs set dedup=on lxd
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.27G 75.7G - 0% 1% 1.00x ONLINE -
root at vtp:~# lxc exec base2 /bin/sh -- -c 'echo world >/usr/test.txt'
root at vtp:~# lxc stop base2
root at vtp:~# lxc publish base2 --alias clonemaster2
Container published with fingerprint:
8a288bd1364d82d4d8afb23aee67fa13586699c539fad94e7946f60372767150
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.88G 75.1G - 1% 2% 1.05x ONLINE -
But then I rebooted, and published another image:
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.87G 75.1G - 1% 2% 1.05x ONLINE -
root at vtp:~# lxc exec base3 /bin/sh -- -c 'echo world2 >/usr/test.txt'
root at vtp:~# lxc stop base3
root at vtp:~# time lxc publish base3 --alias clonemaster3
Container published with fingerprint:
6abbeb5df75989944a533fdbb1d8ab94be4d18cccf20b320c009dd8aef4fb65b
real 0m55.338s
user 0m0.008s
sys 0m0.008s
root at vtp:~# zpool list
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
lxd 77G 1.88G 75.1G - 1% 2% 2.11x ONLINE -
So I suspect it would have all worked if I'd turned on dedupe before the
very first image was fetched.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxcontainers.org/pipermail/lxc-users/attachments/20160516/6b93bfd1/attachment.html>
More information about the lxc-users
mailing list