[lxc-devel] [PATCH v2] safe_mount: Handle mounting proc and refactor
Bogdan Purcareata
bogdan.purcareata at nxp.com
Wed Jan 13 11:27:34 UTC 2016
On 11.01.2016 20:59, Serge Hallyn wrote:
> Quoting Bogdan Purcareata (bogdan.purcareata at nxp.com):
>> The safe_mount primitive will mount the fs in the new container
>> environment by using file descriptors referred in /proc/self/fd.
>> However, when the mounted filesystem is proc itself, it will have
>> been previously unmounted, therefore resulting in an error when
>> searching for these file descriptors. This only happens when there's
>> no container rootfs prefix (commonly with lxc-execute).
>>
>> Implement the support for this use case as well, by doing the mount
>> based on the full path.
>>
>> Refactor the whole function in order to remove duplicated code checks
>> and improve readability.
>>
>> Changes since v1:
>> - In order to address CVE-2015-1335, still check if the destination is
>> not a symlink. Do the mount only if the destination file descriptor
>> exists.
>>
>> Signed-off-by: Bogdan Purcareata <bogdan.purcareata at nxp.com>
>> ---
>> src/lxc/utils.c | 49 ++++++++++++++++++++++++++++---------------------
>> 1 file changed, 28 insertions(+), 21 deletions(-)
>>
>> diff --git a/src/lxc/utils.c b/src/lxc/utils.c
>> index d9e769d..c53711a 100644
>> --- a/src/lxc/utils.c
>> +++ b/src/lxc/utils.c
>> @@ -1644,9 +1644,9 @@ out:
>> int safe_mount(const char *src, const char *dest, const char *fstype,
>> unsigned long flags, const void *data, const char *rootfs)
>> {
>> - int srcfd = -1, destfd, ret, saved_errno;
>> + int srcfd = -1, destfd = -1, ret = 0;
>> char srcbuf[50], destbuf[50]; // only needs enough for /proc/self/fd/<fd>
>> - const char *mntsrc = src;
>> + const char *mntsrc = src, *mntdest = dest;
>>
>> if (!rootfs)
>> rootfs = "";
>> @@ -1655,45 +1655,52 @@ int safe_mount(const char *src, const char *dest, const char *fstype,
>> if (flags & MS_BIND && src && src[0] != '/') {
>> INFO("this is a relative bind mount");
>> srcfd = open_without_symlink(src, NULL);
>> - if (srcfd < 0)
>> - return srcfd;
>> + if (srcfd < 0) {
>> + ret = srcfd;
>> + goto out;
>> + }
>> ret = snprintf(srcbuf, 50, "/proc/self/fd/%d", srcfd);
>> if (ret < 0 || ret > 50) {
>> - close(srcfd);
>> ERROR("Out of memory");
>> - return -EINVAL;
>> + ret = -EINVAL;
>> + goto out_src;
>> }
>> mntsrc = srcbuf;
>> }
>>
>> destfd = open_without_symlink(dest, rootfs);
>> if (destfd < 0) {
>> - if (srcfd != -1)
>> - close(srcfd);
>> - return destfd;
>> + ret = destfd;
>> + goto out_src;
>> }
>>
>> ret = snprintf(destbuf, 50, "/proc/self/fd/%d", destfd);
>> if (ret < 0 || ret > 50) {
>> - if (srcfd != -1)
>> - close(srcfd);
>> - close(destfd);
>> ERROR("Out of memory");
>> - return -EINVAL;
>> + ret = -EINVAL;
>> + goto out_dest;
>> }
>>
>> - ret = mount(mntsrc, destbuf, fstype, flags, data);
>> - saved_errno = errno;
>> - if (srcfd != -1)
>> - close(srcfd);
>> - close(destfd);
>> + /* make sure the destination descriptor exists */
>> + if (access(destbuf, F_OK) == 0)
>> + mntdest = destbuf;
> First, if we're going to shortcut I'd prefer to say "if /proc/self
> does not exist then skip this check" fo rnow.
> But can we think of any way to still do this check?
>
> What exactly are the cases?
>
> 1. lxc-execute, lxc-init tries to mount /proc. We should be able
> to simply have lxc always mount /proc before the pivot_root, so
> we can properly do this check.
>
> what use-cases will break if we demand /proc to exist in the
> container? (We can add an option to umount /proc in lxc-init,
> but the directory would have to exist)
That's my use case - the failing function is tmp_proc_mount, thus happening
before pivot_root, and the scenario is running an application container with
lxc-execute. This calls mount_proc_if_needed which is bound to fail, since it
will first unmount /proc (because after cloning the process in the new
namespaces, its PID will be different than /proc/self), and then try to
safe_mount it, depending on /proc/self/fd. However, tmp_proc_mount will not
fail, and afterwards lxc-init will mount /proc following the behavior you
mentioned. So functionally all is ok, but tmp_proc_mount -> mount_proc_if_needed
-> safe_mount will still spit a nasty warning.
BTW this whole thing is with LXC 1.1.5.
>
> 2. lxc.rootfs unset. In this case we're trusting the *host* admin
> to not have messed with /proc to make it a symlink, if we can't
> trust that we're out of luck. Other paths are not the same (since
> parts of the rootfs could be bind-mounted from container-owned
> dirs) so we should check those. But so for this check we should
> simply explicitly check for "/proc". Doing a more roundabout
> check may leave us open to subtle different attacks. In particular
> I imagine there are other ways to make /proc/self/fd/N not
> access-able, and you are, in that case, re-introducing the TOCTTOU -
> the attacker could try to quickliy insert the symlink after our
> checks and before the real mount().
Leaving lxc.rootfs unset results in lxc-start blocking right after
lxc_mount_auto_mounts. However, setting it to "/" leads to a successful
scenario, since lxc_mount_auto_mounts will mount a temporary /proc in
/usr/var/lib/lxc/rootfs, so mount_proc_if_needed will no longer lead to mounting
/proc again.
So my thought right now is:
- leave the open_without_symlink call in order to check that the destination is
not a symlink
- use the full destination path in mount syscall only if the fstype is proc and
the destination path is "/proc" - otherwise use destfd
Does this make sense?
>> + ret = mount(mntsrc, mntdest, fstype, flags, data);
>> if (ret < 0) {
>> - errno = saved_errno;
>> SYSERROR("Failed to mount %s onto %s", src, dest);
>> - return ret;
>> + goto out_dest;
>> }
>>
>> - return 0;
>> + ret = 0;
>> +
>> +out_dest:
>> + if (destfd > 0)
> These should be >= 0 (here and below) right?
>
True, 0 _is_ a valid file descriptor. Will fix in v3.
>> + close(destfd);
>> +out_src:
>> + if (srcfd > 0)
>> + close(srcfd);
>> +out:
>> + return ret;
>> }
Thanks for the feedback!
Bogdan P.
More information about the lxc-devel
mailing list