[lxc-devel] [PATCH 1/3] cgfs: don't mount /sys/fs/cgroup readonly

Serge Hallyn serge.hallyn at ubuntu.com
Mon May 5 19:14:47 UTC 2014


Quoting Christian Seiler (christian at iwakd.de):
> Ubuntu containers have had trouble with automatic cgroup mounting that
> was not read-write (i.e. lxc.mount.auto = cgroup{,-full}:{ro,mixed}) in
> containers without CAP_SYS_ADMIN. Ubuntu's mountall program reads
> /lib/init/fstab, which contains an entry for /sys/fs/cgroup. Since
> there is no ro option specified for that filesystem, mountall will try
> to remount it readwrite if it is already mounted. Without
> CAP_SYS_ADMIN, that fails and mountall will interrupt boot and wait for
> user input on whether to proceed anyway or to manually fix it,
> effectively hanging container bootup.
> 
> This patch makes sure that /sys/fs/cgroup is always a readwrite tmpfs,
> but that the actual cgroup hierarchy paths (/sys/fs/cgroup/$subsystem)
> are readonly if :ro or :mixed is used. This still has the desired
> effect within the container (no cgroup escalation possible and programs
> get errors if they try to do so anyway), while keeping Ubuntu
> containers happy.
> 
> Signed-off-by: Christian Seiler <christian at iwakd.de>
> Cc: Serge Hallyn <serge.hallyn at ubuntu.com>

Indeed this fixes it here, thanks very much.

Acked-by: Serge E. Hallyn <serge.hallyn at ubuntu.com>

> ---
>  doc/lxc.container.conf.sgml.in |   20 ++++++++++++++++++++
>  src/lxc/cgfs.c                 |   38 ++++++++++++++++++++++++++++++++------
>  2 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/doc/lxc.container.conf.sgml.in b/doc/lxc.container.conf.sgml.in
> index 7bd2c9e..d3e3ef8 100644
> --- a/doc/lxc.container.conf.sgml.in
> +++ b/doc/lxc.container.conf.sgml.in
> @@ -812,6 +812,26 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>  	      </listitem>
>  	    </itemizedlist>
>  	    <para>
> +	      Note that if automatic mounting of the cgroup filesystem
> +	      is enabled, the tmpfs under
> +	      <filename>/sys/fs/cgroup</filename> will always be
> +	      mounted read-write (but for the <option>:mixed</option>
> +	      and <option>:ro</option> cases, the individual
> +	      hierarchies,
> +	      <filename>/sys/fs/cgroup/$hierarchy</filename>, will be
> +	      read-only). This is in order to work around a quirk in
> +	      Ubuntu's
> +              <citerefentry>
> +		<refentrytitle>mountall</refentrytitle>
> +                <manvolnum>8</manvolnum>
> +              </citerefentry>
> +	      command that will cause containers to wait for user
> +	      input at boot if
> +	      <filename>/sys/fs/cgroup</filename> is mounted read-only
> +	      and the container can't remount it read-write due to a
> +	      lack of CAP_SYS_ADMIN.
> +	    </para>
> +	    <para>
>  	      Examples:
>  	    </para>
>  	    <programlisting>
> diff --git a/src/lxc/cgfs.c b/src/lxc/cgfs.c
> index db2a973..d75037a 100644
> --- a/src/lxc/cgfs.c
> +++ b/src/lxc/cgfs.c
> @@ -1442,6 +1442,24 @@ static bool cgroupfs_mount_cgroup(void *hdata, const char *root, int type)
>  				goto out_error;
>  			}
>  
> +			/* for read-only and mixed cases, we have to bind-mount the tmpfs directory
> +			 * that points to the hierarchy itself (i.e. /sys/fs/cgroup/cpu etc.) onto
> +			 * itself and then bind-mount it read-only, since we keep the tmpfs itself
> +			 * read-write (see comment below)
> +			 */
> +			if (type == LXC_AUTO_CGROUP_MIXED || type == LXC_AUTO_CGROUP_RO) {
> +				r = mount(abs_path, abs_path, NULL, MS_BIND, NULL);
> +				if (r < 0) {
> +					SYSERROR("error bind-mounting %s onto itself", abs_path);
> +					goto out_error;
> +				}
> +				r = mount(NULL, abs_path, NULL, MS_REMOUNT|MS_BIND|MS_RDONLY, NULL);
> +				if (r < 0) {
> +					SYSERROR("error re-mounting %s readonly", abs_path);
> +					goto out_error;
> +				}
> +			}
> +
>  			free(abs_path);
>  			abs_path = NULL;
>  
> @@ -1487,13 +1505,21 @@ static bool cgroupfs_mount_cgroup(void *hdata, const char *root, int type)
>  		parts = NULL;
>  	}
>  
> -	/* try to remount the tmpfs readonly, since the container shouldn't
> -	 * change anything (this will also make sure that trying to create
> -	 * new cgroups outside the allowed area fails with an error instead
> -	 * of simply causing this to create directories in the tmpfs itself)
> +	/* We used to remount the entire tmpfs readonly if any :ro or
> +	 * :mixed mode was specified. However, Ubuntu's mountall has the
> +	 * unfortunate behavior to block bootup if /sys/fs/cgroup is
> +	 * mounted read-only and cannot be remounted read-write.
> +	 * (mountall reads /lib/init/fstab and tries to (re-)mount all of
> +	 * these if they are not already mounted with the right options;
> +	 * it contains an entry for /sys/fs/cgroup. In case it can't do
> +	 * that, it prompts for the user to either manually fix it or
> +	 * boot anyway. But without user input, booting of the container
> +	 * hangs.)
> +	 *
> +	 * Instead of remounting the entire tmpfs readonly, we only
> +	 * remount the paths readonly that are part of the cgroup
> +	 * hierarchy.
>  	 */
> -	if (type != LXC_AUTO_CGROUP_RW && type != LXC_AUTO_CGROUP_FULL_RW)
> -		mount(NULL, path, NULL, MS_REMOUNT|MS_RDONLY, NULL);
>  
>  	free(path);
>  
> -- 
> 1.7.10.4
> 


More information about the lxc-devel mailing list