[lxc-devel] [PATCH 1/3] cgfs: don't mount /sys/fs/cgroup readonly

Christian Seiler christian at iwakd.de
Sat May 3 18:57:44 UTC 2014


Ubuntu containers have had trouble with automatic cgroup mounting that
was not read-write (i.e. lxc.mount.auto = cgroup{,-full}:{ro,mixed}) in
containers without CAP_SYS_ADMIN. Ubuntu's mountall program reads
/lib/init/fstab, which contains an entry for /sys/fs/cgroup. Since
there is no ro option specified for that filesystem, mountall will try
to remount it readwrite if it is already mounted. Without
CAP_SYS_ADMIN, that fails and mountall will interrupt boot and wait for
user input on whether to proceed anyway or to manually fix it,
effectively hanging container bootup.

This patch makes sure that /sys/fs/cgroup is always a readwrite tmpfs,
but that the actual cgroup hierarchy paths (/sys/fs/cgroup/$subsystem)
are readonly if :ro or :mixed is used. This still has the desired
effect within the container (no cgroup escalation possible and programs
get errors if they try to do so anyway), while keeping Ubuntu
containers happy.

Signed-off-by: Christian Seiler <christian at iwakd.de>
Cc: Serge Hallyn <serge.hallyn at ubuntu.com>
---
 doc/lxc.container.conf.sgml.in |   20 ++++++++++++++++++++
 src/lxc/cgfs.c                 |   38 ++++++++++++++++++++++++++++++++------
 2 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/doc/lxc.container.conf.sgml.in b/doc/lxc.container.conf.sgml.in
index 7bd2c9e..d3e3ef8 100644
--- a/doc/lxc.container.conf.sgml.in
+++ b/doc/lxc.container.conf.sgml.in
@@ -812,6 +812,26 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 	      </listitem>
 	    </itemizedlist>
 	    <para>
+	      Note that if automatic mounting of the cgroup filesystem
+	      is enabled, the tmpfs under
+	      <filename>/sys/fs/cgroup</filename> will always be
+	      mounted read-write (but for the <option>:mixed</option>
+	      and <option>:ro</option> cases, the individual
+	      hierarchies,
+	      <filename>/sys/fs/cgroup/$hierarchy</filename>, will be
+	      read-only). This is in order to work around a quirk in
+	      Ubuntu's
+              <citerefentry>
+		<refentrytitle>mountall</refentrytitle>
+                <manvolnum>8</manvolnum>
+              </citerefentry>
+	      command that will cause containers to wait for user
+	      input at boot if
+	      <filename>/sys/fs/cgroup</filename> is mounted read-only
+	      and the container can't remount it read-write due to a
+	      lack of CAP_SYS_ADMIN.
+	    </para>
+	    <para>
 	      Examples:
 	    </para>
 	    <programlisting>
diff --git a/src/lxc/cgfs.c b/src/lxc/cgfs.c
index db2a973..d75037a 100644
--- a/src/lxc/cgfs.c
+++ b/src/lxc/cgfs.c
@@ -1442,6 +1442,24 @@ static bool cgroupfs_mount_cgroup(void *hdata, const char *root, int type)
 				goto out_error;
 			}
 
+			/* for read-only and mixed cases, we have to bind-mount the tmpfs directory
+			 * that points to the hierarchy itself (i.e. /sys/fs/cgroup/cpu etc.) onto
+			 * itself and then bind-mount it read-only, since we keep the tmpfs itself
+			 * read-write (see comment below)
+			 */
+			if (type == LXC_AUTO_CGROUP_MIXED || type == LXC_AUTO_CGROUP_RO) {
+				r = mount(abs_path, abs_path, NULL, MS_BIND, NULL);
+				if (r < 0) {
+					SYSERROR("error bind-mounting %s onto itself", abs_path);
+					goto out_error;
+				}
+				r = mount(NULL, abs_path, NULL, MS_REMOUNT|MS_BIND|MS_RDONLY, NULL);
+				if (r < 0) {
+					SYSERROR("error re-mounting %s readonly", abs_path);
+					goto out_error;
+				}
+			}
+
 			free(abs_path);
 			abs_path = NULL;
 
@@ -1487,13 +1505,21 @@ static bool cgroupfs_mount_cgroup(void *hdata, const char *root, int type)
 		parts = NULL;
 	}
 
-	/* try to remount the tmpfs readonly, since the container shouldn't
-	 * change anything (this will also make sure that trying to create
-	 * new cgroups outside the allowed area fails with an error instead
-	 * of simply causing this to create directories in the tmpfs itself)
+	/* We used to remount the entire tmpfs readonly if any :ro or
+	 * :mixed mode was specified. However, Ubuntu's mountall has the
+	 * unfortunate behavior to block bootup if /sys/fs/cgroup is
+	 * mounted read-only and cannot be remounted read-write.
+	 * (mountall reads /lib/init/fstab and tries to (re-)mount all of
+	 * these if they are not already mounted with the right options;
+	 * it contains an entry for /sys/fs/cgroup. In case it can't do
+	 * that, it prompts for the user to either manually fix it or
+	 * boot anyway. But without user input, booting of the container
+	 * hangs.)
+	 *
+	 * Instead of remounting the entire tmpfs readonly, we only
+	 * remount the paths readonly that are part of the cgroup
+	 * hierarchy.
 	 */
-	if (type != LXC_AUTO_CGROUP_RW && type != LXC_AUTO_CGROUP_FULL_RW)
-		mount(NULL, path, NULL, MS_REMOUNT|MS_RDONLY, NULL);
 
 	free(path);
 
-- 
1.7.10.4



More information about the lxc-devel mailing list