[lxc-devel] Report correct filesystem usage / limits on BTRFS subvolumes with quota

Qu Wenruo quwenruo.btrfs at gmx.com
Wed Aug 1 01:23:45 UTC 2018



On 2018年08月01日 00:03, Austin S. Hemmelgarn wrote:
> On 2018-07-31 10:32, Qu Wenruo wrote:
>>
>>
>> On 2018年07月31日 21:49, Thomas Leister wrote:
>>> Dear David,
>>> hello everyone,
>>>
>>> during a recent project of mine involving LXD and BTRFS I found out that
>>> quotas on BTRFS subvolumes are enforced, but file system usage and
>>> limits set via quotas are not reported correctly in LXC containers.
>>>
>>> I've found this discussion regarding my problem:
>>> https://github.com/lxc/lxd/issues/2180
>>
>> That's not the expected usage of btrfs qgroup/quota.
>>
>> Quota only accounts how many bytes are used exclusively or shared
>> between subvolumes at extent level.
>>
>>>
>>> There was already a proposal to introduce subvolume quota support some
>>> time ago:
>>> https://marc.info/?l=linux-btrfs&m=147576434114415&w=2
>>
>> It's in fact impossible if I didn't miss something.
>>
>> There are several technical problems in the proposal:
>>
>> 1) Multi-level qgroups
>>     The real limit is limited by all related qgroups, including higher
>>     level qgroup.
>>     Such design makes it pretty hard to calculation the real limit.
>>
>> 2) Different limitations on exclusive/shared bytes
>>     Btrfs can set different limit on exclusive/shared bytes, further
>>     complicating the problem.
>>
>> 3) Btrfs quota only accounts data/metadata used by the subvolume
>>     It lacks all the shared trees (mentioned below), and in fact such
>>     shared tree can be pretty large (especially for extent tree and csum
>>     tree).
>>     Only accounting quota limit would hit real ENOSPC easily IMHO.
>>
>>>
>>> @David as I've seen your response on that topic on the mailing list,
>>> maybe you can tell me if there are any plans to support correct
>>> subvolume quota reporting e.g. for "df -h" calls from within a
>>> container? Maybe there's already something on your / SUSE's roadmap? :-)
>>>
>>> As more and more container environments spin up these days, there might
>>> be a growing demand on that :-) Personally I'd really appreciate if I
>>> could read the current file system usage and limit from within a
>>> container using BTRFS as storage backend.
>>
>> For current btrfs design, I think it's skeptical to implement such
>> design.
>> The main problem here is, btrfs doesn't do the full LVM work. (unlike
>> ZFS IIRC)
>> It doesn't really manage multiple volumes, that's why it's called
>> subvolume in btrfs.
> ZFS quotas work the way they do not because it's trivial to implement
> them that way due to the underlying implementation, but because they
> provide the functionality that people actually want.  Being able to put
> proper hard limits on space usage for a given volume/subvolume/dataset
> is _critical_ for a large number of enterprise deployment scenarios.
> Same goes for being able to put a fixed space reservation for a given
> volume/subvolume/dataset.  If we want to even remotely compete (and it
> sure seems like we do), we need equivalent features that work
> intuitively for _regular_ people (not those who have intimate
> understandings of the internal workings of BTRFS).

Then, the design and use case of btrfs quota itself needs to be reworked
from the the very beginning.
At least get rid of the high level qgroups and exclusive/reference limit.

Or there will be no way to report in df using that 2 limits.

Thanks,
Qu

> 
>> A subvolume is not a fully usable fs, it's just a subset of a full fs.
>> It relies on all the other trees (root tree, extent tree, chunk tree,
>> csum tree, and quota tree in this case) to do all the work.
> A ZFS dataset isn't a fully usable FS either.  It's still dependent on
> all the underlying infrastructure from the zpool itself (and so are
> zvols), which, in fact, does a vast majority of the work.  The
> difference here is that a ZFS dataset is far more self-contained than a
> BTRFS subvolume.  If we ever want sane per-subvolume storage profiles or
> mount options, we're going to need to get a lot closer to that anyway.
> 
>> Thus it's pretty hard to implement such special purposed df call.
> To implement it perfectly maybe.  Except most applications don't need it
> to be perfect, they want to know how much space they can actually use.
> Even a trivial blatantly imperfect implementation that just shows you
> the total space that can be used and how much is used based on quotas
> will give better behavior that the current case of just hiding the
> quotas behind a root-only call.  Pretty much anything which does it's
> own disk usage management is currently broken on BTRFS when quotas are
> being used.  Just reporting the quota for the total space, and the space
> accounted to the subvolume by the quota would fix almost all such
> applications.
>>
>> On the other hand, isn't easier to implement special interface for
>> container to get real disk usage/limit other than using the old vanilla
>> df interface?
> This isn't just an issue for containers.  Anybody who is using quotas
> like they are typically used in ZFS deployments has the same issue, and
> there _ARE_ people doing that (see for example OpenSUSE, where they are
> using quotas (if they are enabled because of snapshot support) to limit
> space consumption of paths like /tmp).
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20180801/26aab110/attachment.sig>


More information about the lxc-devel mailing list