[Lxc-users] version 0.8.0 coming soon

Serge Hallyn serge.hallyn at canonical.com
Tue Feb 28 23:16:12 UTC 2012


Quoting Papp Tamas (tompos at martos.bme.hu):
> On 02/28/2012 04:13 PM, Serge Hallyn wrote:
> >Quoting Papp Tamas (tompos at martos.bme.hu):
> >>On 02/28/2012 01:20 AM, Serge Hallyn wrote:
> >>>Quoting Daniel Lezcano (daniel.lezcano at free.fr):
> >>>>Hi all,
> >>>>
> >>>>I will release a 0.8.0-rc1. I am looking for volunteer to test it :)
> >>>Worked fine for me.  Tested create and clone of ubuntu, ubuntu and
> >>>ubuntu-cloud images, with dir and lvm backing stores.  (And a run
> >>>of lp:~serge-hallyn/+junk/lxc-test)
> >>>
> >>>Note, because upstream kernel didn't much care about the
> >>>'mount -o remount,ro /' problem, I'm going to patch lxc to
> >>>pin open a '${rootfs}.hold' file, as long as the container
> >>>is running.  That will prevent the underlying fs from being
> >>>remounted ro.  (see
> >>>https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/942325 for
> >>>details).  That'll buy us some time to find a better solution
> >>>in the kernel.
> >>>
> >>>
> >>Why can a container change mount options outside of its rootfs?
> >>Sorry for the stupid question:)
> >It's not a stupid question at all.
> >
> >The container isn't changing mount options outside of its rootfs.  THere
> >are two places an fs can be marked readonly - in the mount itself, and in
> >the superblock.  When you make a bind mount, you are creating more mounts
> >(vfsmounts) using the same superblcok.
> >
> >If you do
> >
> >	mount --bind / / # not needed in container bc it's already been done
> >	mount --bind -o remount,ro /
> >
> >then you are setting the reasonly flag on the mount itself.  If you just do
> >
> >	mount -o remount,ro /
> >
> >then you are setting the reasonly flag on the superblock, which will
> >force all other mounts of that superblcok to also be readonly.
> >
> >Right now there is no way to prevent a container from doing that.  I sent
> >a patch to make the devices cgroup be consulted on that, so that it could
> >reteurn -EPERM.  That was refused.  The two other options I'm considering
> >(and it wouldn't hurt ot have both) are 1. to pass the  remoutn flags to the
> >LSM (selinux or apparmor or smack) so that it can deny permission.  Right
> >now it can't do that (except for all-or-nothing check on remount).  And 2.
> >to make it so that after doing
> >
> >	mount --bind / /
> >	mount --bind -o remount,ro /
> >	mount --bind -o remount,rw /
> >
> >any subsequent
> >
> >	mount -o remount,rw /
> >
> >would be refused (or automatically done only at the mount level).  I don't
> >think that should be hard to do at fs/namespace.c:do_remount().
> 
> 
> This may be to much for my brain:)
> 
> Anyway, could you make deb package from it?

I've got it working for an ubuntu package, though we're in freeze right
now.  I intend to push the patch to my github tree tomorrow, and I've
pushed the package to ppa:serge-hallyn/virt (version 0.7.5-3ubuntu31,
should build in a few hours).  Meanwhile here is the actual patch for
now.

Tests fine for me.

Subject: lxc-start: if rootfs is a dir, pin the fs

Otherwise the container can remount the shared underlying fs readonly.

Index: lxc-dnsmasq/src/lxc/conf.c
===================================================================
--- lxc-dnsmasq.orig/src/lxc/conf.c	2012-02-28 19:13:01.400960000 +0000
+++ lxc-dnsmasq/src/lxc/conf.c	2012-02-28 20:05:45.538144907 +0000
@@ -445,6 +445,51 @@
 	return mount_unknow_fs(rootfs, target, 0);
 }
 
+/*
+ * pin_rootfs
+ * if rootfs is a directory, then open ${rootfs}.hold for writing for the
+ * duration of the container run, to prevent the container from marking the
+ * underlying fs readonly on shutdown.
+ * return -1 on error.
+ * return -2 if nothing needed to be pinned.
+ * return an open fd (>=0) if we pinned it.
+ */
+int pin_rootfs(const char *rootfs)
+{
+	char absrootfs[MAXPATHLEN];
+	char absrootfspin[MAXPATHLEN];
+	struct stat s;
+	int ret, fd;
+
+	if (!realpath(rootfs, absrootfs)) {
+		SYSERROR("failed to get real path for '%s'", rootfs);
+		return -1;
+	}
+
+	if (access(absrootfs, F_OK)) {
+		SYSERROR("'%s' is not accessible", absrootfs);
+		return -1;
+	}
+
+	if (stat(absrootfs, &s)) {
+		SYSERROR("failed to stat '%s'", absrootfs);
+		return -1;
+	}
+
+	if (!__S_ISTYPE(s.st_mode, S_IFDIR))
+		return -2;
+
+	ret = snprintf(absrootfspin, MAXPATHLEN, "%s%s", absrootfs, ".hold");
+	if (ret >= MAXPATHLEN) {
+		SYSERROR("pathname too long for rootfs hold file");
+		return -1;
+	}
+
+	fd = open(absrootfspin, O_CREAT | O_RDWR, S_IWUSR|S_IRUSR);
+	INFO("opened %s as fd %d\n", absrootfspin, fd);
+	return fd;
+}
+
 static int mount_rootfs(const char *rootfs, const char *target)
 {
 	char absrootfs[MAXPATHLEN];
Index: lxc-dnsmasq/src/lxc/conf.h
===================================================================
--- lxc-dnsmasq.orig/src/lxc/conf.h	2012-02-28 19:13:01.400960000 +0000
+++ lxc-dnsmasq/src/lxc/conf.h	2012-02-28 19:13:01.400960000 +0000
@@ -218,6 +218,8 @@
  */
 extern struct lxc_conf *lxc_conf_init(void);
 
+extern int pin_rootfs(const char *rootfs);
+
 extern int lxc_create_network(struct lxc_handler *handler);
 extern void lxc_delete_network(struct lxc_list *networks);
 extern int lxc_assign_network(struct lxc_list *networks, pid_t pid);
Index: lxc-dnsmasq/src/lxc/start.c
===================================================================
--- lxc-dnsmasq.orig/src/lxc/start.c	2012-02-28 19:13:01.400960000 +0000
+++ lxc-dnsmasq/src/lxc/start.c	2012-02-28 20:07:41.174882442 +0000
@@ -565,6 +565,7 @@
 	int clone_flags;
 	int failed_before_rename = 0;
 	const char *name = handler->name;
+	int pinfd;
 
 	if (lxc_sync_init(handler))
 		return -1;
@@ -585,6 +586,17 @@
 	}
 
 
+	/*
+	 * if the rootfs is not a blockdev, prevent the container from
+	 * marking it readonly.
+	 */
+
+	pinfd = pin_rootfs(handler->conf->rootfs.path);
+	if (pinfd == -1) {
+		ERROR("failed to pin the container's rootfs");
+		goto out_abort;
+	}
+
 	/* Create a process in a new set of namespaces */
 	handler->pid = lxc_clone(do_start, handler, clone_flags);
 	if (handler->pid < 0) {
@@ -627,6 +639,10 @@
 	}
 
 	lxc_sync_fini(handler);
+
+	if (pinfd >= 0)
+		close(pinfd);
+
 	return 0;
 
 out_delete_net:




More information about the lxc-users mailing list