[lxc-devel] [PATCH] cgroupfs: cpuset support for kernels without cgroup.clone_children

Robert Vogelgesang vogel at users.sourceforge.net
Wed Jan 29 13:52:35 UTC 2014


Hi Serge,

first, thank you for accepting and cleaning up my patch.

I'd like to submit two small changes on top of your work; please
see my inline notes, and the patch at the end.

	Robert

On Wed, Jan 29, 2014 at 10:01:06AM +0000, Serge Hallyn wrote:
> Quoting Robert Vogelgesang (vogel at users.sourceforge.net):
> > Hi Serge,
> > 
> > On Tue, Jan 28, 2014 at 04:32:30PM +0000, Serge Hallyn wrote:
> > > Quoting Robert Vogelgesang (vogel at users.sourceforge.net):
> > > > Hi,
> > > > 
> > > > as promised last week, here's my patch for cpuset cgroup support for
> > > > kernels without the cgroup.clone_children feature.
> > > 
> > > Thanks.
> > > 
> > > > My initial patch used "#include <linux/version.h>" and the macros defined
> > > > there to decide if cgroup.clone_children should be used or not.  After
> > > > having seen Serge Hallyn's patch which he posted to the list last Wednesday,
> > > > where he used stat() to check if the cgroup.clone_children file is there,
> > > > I rewrote my patch to do the same.
> > > > 
> > > > The patch is against 1.0.0.beta3, and it is tested successfully with
> > > > RHEL-6's kernel version 2.6.32-431.3.1.el6, compiled without cgmanager
> > > 
> > > Shouldn't be a problem as the cgfs_create function won't be called if
> > > cgmanager is running.
> > > 
> > > > (I've so far not tried to use cgmanager in RHEL-6).
> > > > 
> > > > In addition to fixing the cpuset cgroup setup, this patch also fixes a
> > > > wrong argument in a call to handle_cgroup_settings() in the same context.
> > > > 
> > > > 	Robert
> > > > 
> > > > Signed-off-by: Robert Vogelgesang <vogel at users.sourceforge.net>
> > > 
> > > Just a two questions.  I'll go ahead and run tests here while waiting
> > > for an answer.  If it comes down to "add the free and use 1024", I can
> > > just do that inline.
> > 
> > Yes, thank you, "free(childfile);" should be added where you noted it.
> > 
> > Regarding the size of the "value" buffer: 128 is what I used in my
> > tests; it should be enough for servers with up to around 80 CPUs;
> > 1024 would support at least around 540 CPUs.  I don't know where to
> > set the limit.
> > 
> > Maybe we should modify the patch so that init_cpuset_if_needed()
> > checks if cgroup.clone_children is available before calling
> > do_init_cpuset_file().  Under the assumption that the really large
> > machines are running newer kernels, this buffer would then never be used.
> > 
> > Alternatively we could change the initial check for non-empty cpuset.cpus
> > and cpuset.mems files to not treat an insufficient buffer size as
> > an error.  This, too, would make the buffer size irrelevant on kernels
> > that support cgroup.clone_children.
> > 
> > Let me know if I should do one or the other.
> 
> Ok I've gone ahead and done both.  My change ended up more invasive than
> I had planned, but it tests ok here and should do the right thing.  I've
> gone ahead and pushed the below:
> 
> >From 934b1673cda3f4fc17d66cfe3f429d7fafe3f4a6 Mon Sep 17 00:00:00 2001
> From: Serge Hallyn <serge.hallyn at ubuntu.com>
> Date: Wed, 29 Jan 2014 09:40:39 +0000
> Subject: [PATCH 2/4] cgroups: adjust previous commit
> 
> Remove a memory leak on error path.
> 
> Only try to initialize cpuset if cgroup.clonechildren does not exist.
> 
> Bump the max value we read from cpuset.{cpus,mems} to 1024.
> 
> If cpuset.cpus or .mems is already initialized but is too long, don't fail.
> 
> If parent's cpuset.cpus or .mems is too long, record an error and fail.
> If anyone actually runs into this, we can simply allocate the required
> length as needed, but we don't expect anyone to run into this.
> 
> Signed-off-by: Serge Hallyn <serge.hallyn at ubuntu.com>
> ---
>  src/lxc/cgroup.c | 73 +++++++++++++++++++++++++++++++++-----------------------
>  src/lxc/cgroup.h |  1 +
>  2 files changed, 44 insertions(+), 30 deletions(-)
> 
> diff --git a/src/lxc/cgroup.c b/src/lxc/cgroup.c
> index 87d1bb7..fd60155 100644
> --- a/src/lxc/cgroup.c
> +++ b/src/lxc/cgroup.c
> @@ -2026,6 +2026,8 @@ static int handle_cgroup_settings(struct cgroup_mount_point *mp,
>  	int r, saved_errno = 0;
>  	char buf[2];
>  
> +	mp->need_cpuset_init = false;
> +
>  	/* If this is the memory cgroup, we want to enforce hierarchy.
>  	 * But don't fail if for some reason we can't.
>  	 */
> @@ -2058,6 +2060,7 @@ static int handle_cgroup_settings(struct cgroup_mount_point *mp,
>  		 * was created
>  		 */
>  		if (stat(cc_path, &sb) != 0 && errno == ENOENT) {
> +			mp->need_cpuset_init = true;
>  			free(cc_path);
>  			return 0;
>  		}
> @@ -2075,75 +2078,85 @@ static int handle_cgroup_settings(struct cgroup_mount_point *mp,
>  	return 0;
>  }
>  
> -static bool cgroup_read_from_file(const char *fn, char buf[], size_t bufsize)
> +static int cgroup_read_from_file(const char *fn, char buf[], size_t bufsize)
>  {
>  	int ret = lxc_read_from_file(fn, buf, bufsize);
>  	if (ret < 0) {
>  		SYSERROR("failed to read %s", fn);
> -		return false;
> +		return ret;
>  	}
>  	if (ret == bufsize) {
> -		ERROR("too much data in %s", fn);
> -		return false;
> +		if (bufsize > 0) {
> +			/* obviously this wasn't empty */
> +			buf[bufsize-1] = '\0';
> +			return ret;
> +		}
> +		/* Callers don't do this, but regression/sanity check */
> +		ERROR("%s: was not expecting 0 bufsize", __func__);
> +		return -1;
>  	}
>  	buf[ret] = '\0';
> -	return true;
> +	return ret;
>  }
>  
>  static bool do_init_cpuset_file(struct cgroup_mount_point *mp,
>  				const char *path, const char *name)
>  {
> -	char value[128];
> -	char *childfile, *parentfile, *tmp;
> -	bool ok;
> +	char value[1024];
> +	char *childfile, *parentfile = NULL, *tmp;
> +	int ret;
> +	bool ok = false;
> +
> +	if (!mp->need_cpuset_init)
> +		return true;

This "if" should be moved to init_cpuset_if_needed().

I meant do_init_cpuset_file() to be a "low level" function that
assumes all preconditions are met and just does the work that must
be done.  init_cpuset_if_needed() does check if do_init_cpuset_file()
is needed.

>  
>  	childfile = cgroup_to_absolute_path(mp, path, name);
>  	if (!childfile)
>  		return false;
>  
>  	/* don't overwrite a non-empty value in the file */
> -	if (!cgroup_read_from_file(childfile, value, sizeof(value))) {
> -		free(childfile);
> -		return false;
> -	}
> +	ret = cgroup_read_from_file(childfile, value, sizeof(value));
> +	if (ret < 0)
> +		goto out;
>  	if (value[0] != '\0' && value[0] != '\n') {
> -		free(childfile);
> -		return true;
> +		ok = true;
> +		goto out;
>  	}
>  
>  	/* path to the same name in the parent cgroup */
>  	parentfile = strdup(path);
>  	if (!parentfile)
> -		return false;
> +		goto out;
> +
>  	tmp = strrchr(parentfile, '/');
> -	if (!tmp) {
> -		free(childfile);
> -		free(parentfile);
> -		return false;
> -	}
> +	if (!tmp)
> +		goto out;
>  	if (tmp == parentfile)
>  		tmp++; /* keep the '/' at the start */
>  	*tmp = '\0';
>  	tmp = parentfile;
>  	parentfile = cgroup_to_absolute_path(mp, tmp, name);
>  	free(tmp);
> -	if (!parentfile) {
> -		free(childfile);
> -		return false;
> -	}
> +	if (!parentfile)
> +		goto out;
>  
>  	/* copy from parent to child cgroup */
> -	if (!cgroup_read_from_file(parentfile, value, sizeof(value))) {
> -		free(parentfile);
> -		free(childfile);
> -		return false;
> +	ret = cgroup_read_from_file(parentfile, value, sizeof(value));
> +	if (ret < 0)
> +		goto out;
> +	if (ret == sizeof(value)) {
> +		/* If anyone actually sees this error, we can address it */
> +		ERROR("parent cpuset value too long");
> +		goto out;
>  	}
>  	ok = (lxc_write_to_file(childfile, value, strlen(value), false) >= 0);
> +
> +out:
>  	if (!ok)
>  		SYSERROR("failed writing %s", childfile);

Having "out:" before the SYSERROR() might lead to misleading error
messages; the label should be moved just below this SYSERROR() line.

> -	free(parentfile);
> +	if (parentfile)
> +		free(parentfile);
>  	free(childfile);
> -
>  	return ok;
>  }
>  
> diff --git a/src/lxc/cgroup.h b/src/lxc/cgroup.h
> index fbb3b87..a2e7cd3 100644
> --- a/src/lxc/cgroup.h
> +++ b/src/lxc/cgroup.h
> @@ -66,6 +66,7 @@ struct cgroup_mount_point {
>  	char *mount_point;
>  	char *mount_prefix;
>  	bool read_only;
> +	bool need_cpuset_init;
>  };
>  
>  /*
> -- 
> 1.8.5.3
> 
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel

Move the test of mp->need_cpuset_init to a logically better place.
Avoid misleading error messages.

Signed-off-by: Robert Vogelgesang <vogel at users.sourceforge.net>

diff -up lxc-lxc-1.0.0.beta3/src/lxc/cgroup.c.cpuset_fix lxc-lxc-1.0.0.beta3/src/lxc/cgroup.c
--- lxc-lxc-1.0.0.beta3/src/lxc/cgroup.c.cpuset_fix	2014-01-29 12:07:55.713820021 +0100
+++ lxc-lxc-1.0.0.beta3/src/lxc/cgroup.c	2014-01-29 14:10:08.000000000 +0100
@@ -2107,9 +2107,6 @@ static bool do_init_cpuset_file(struct c
 	int ret;
 	bool ok = false;
 
-	if (!mp->need_cpuset_init)
-		return true;
-
 	childfile = cgroup_to_absolute_path(mp, path, name);
 	if (!childfile)
 		return false;
@@ -2150,10 +2147,10 @@ static bool do_init_cpuset_file(struct c
 		goto out;
 	}
 	ok = (lxc_write_to_file(childfile, value, strlen(value), false) >= 0);
-
-out:
 	if (!ok)
 		SYSERROR("failed writing %s", childfile);
+
+out:
 	if (parentfile)
 		free(parentfile);
 	free(childfile);
@@ -2168,6 +2165,9 @@ static bool init_cpuset_if_needed(struct
 				 (const char **)mp->hierarchy->subsystems))
 		return true;
 
+	if (!mp->need_cpuset_init)
+		return true;
+
 	return (do_init_cpuset_file(mp, path, "/cpuset.cpus") &&
 		do_init_cpuset_file(mp, path, "/cpuset.mems") );
 }


More information about the lxc-devel mailing list