[lxc-devel] [PATCH] cgroupfs: cpuset support for kernels without cgroup.clone_children
Robert Vogelgesang
vogel at users.sourceforge.net
Tue Jan 28 17:15:57 UTC 2014
Hi Serge,
On Tue, Jan 28, 2014 at 04:32:30PM +0000, Serge Hallyn wrote:
> Quoting Robert Vogelgesang (vogel at users.sourceforge.net):
> > Hi,
> >
> > as promised last week, here's my patch for cpuset cgroup support for
> > kernels without the cgroup.clone_children feature.
>
> Thanks.
>
> > My initial patch used "#include <linux/version.h>" and the macros defined
> > there to decide if cgroup.clone_children should be used or not. After
> > having seen Serge Hallyn's patch which he posted to the list last Wednesday,
> > where he used stat() to check if the cgroup.clone_children file is there,
> > I rewrote my patch to do the same.
> >
> > The patch is against 1.0.0.beta3, and it is tested successfully with
> > RHEL-6's kernel version 2.6.32-431.3.1.el6, compiled without cgmanager
>
> Shouldn't be a problem as the cgfs_create function won't be called if
> cgmanager is running.
>
> > (I've so far not tried to use cgmanager in RHEL-6).
> >
> > In addition to fixing the cpuset cgroup setup, this patch also fixes a
> > wrong argument in a call to handle_cgroup_settings() in the same context.
> >
> > Robert
> >
> > Signed-off-by: Robert Vogelgesang <vogel at users.sourceforge.net>
>
> Just a two questions. I'll go ahead and run tests here while waiting
> for an answer. If it comes down to "add the free and use 1024", I can
> just do that inline.
Yes, thank you, "free(childfile);" should be added where you noted it.
Regarding the size of the "value" buffer: 128 is what I used in my
tests; it should be enough for servers with up to around 80 CPUs;
1024 would support at least around 540 CPUs. I don't know where to
set the limit.
Maybe we should modify the patch so that init_cpuset_if_needed()
checks if cgroup.clone_children is available before calling
do_init_cpuset_file(). Under the assumption that the really large
machines are running newer kernels, this buffer would then never be used.
Alternatively we could change the initial check for non-empty cpuset.cpus
and cpuset.mems files to not treat an insufficient buffer size as
an error. This, too, would make the buffer size irrelevant on kernels
that support cgroup.clone_children.
Let me know if I should do one or the other.
>
> Oh, in the future could you please add -p to your diff args? It makes
> inline review just a touch easier.
I'll try to keep that in mind...
Robert
>
> > diff -u lxc-lxc-1.0.0.beta3/src/lxc/cgroup.c.cpuset2632 lxc-lxc-1.0.0.beta3/src/lxc/cgroup.c
> > --- lxc-lxc-1.0.0.beta3/src/lxc/cgroup.c.cpuset2632 2014-01-27 14:52:20.000000000 +0100
> > +++ lxc-lxc-1.0.0.beta3/src/lxc/cgroup.c 2014-01-27 16:50:11.000000000 +0100
> > @@ -74,6 +74,7 @@
> > static int cgroup_recursive_task_count(const char *cgroup_path);
> > static int count_lines(const char *fn);
> > static int handle_cgroup_settings(struct cgroup_mount_point *mp, char *cgroup_path);
> > +static bool init_cpuset_if_needed(struct cgroup_mount_point *mp, const char *path);
> >
> > static struct cgroup_ops cgfs_ops;
> > struct cgroup_ops *active_cg_ops = &cgfs_ops;
> > @@ -897,15 +898,23 @@
> > r = lxc_grow_array((void ***)&info_ptr->created_paths, &info_ptr->created_paths_capacity, info_ptr->created_paths_count + 1, 8);
> > if (r < 0)
> > goto cleanup_from_error;
> > + if (!init_cpuset_if_needed(info_ptr->designated_mount_point, current_entire_path)) {
> > + ERROR("Failed to initialize cpuset in new '%s'.", current_entire_path);
> > + goto cleanup_from_error;
> > + }
> > info_ptr->created_paths[info_ptr->created_paths_count++] = current_entire_path;
> > } else {
> > /* if we didn't create the cgroup, then we have to make sure that
> > * further cgroups will be created properly
> > */
> > - if (handle_cgroup_settings(mp, info_ptr->cgroup_path) < 0) {
> > + if (handle_cgroup_settings(info_ptr->designated_mount_point, info_ptr->cgroup_path) < 0) {
> > ERROR("Could not set clone_children to 1 for cpuset hierarchy in pre-existing cgroup.");
> > goto cleanup_from_error;
> > }
> > + if (!init_cpuset_if_needed(info_ptr->designated_mount_point, info_ptr->cgroup_path)) {
> > + ERROR("Failed to initialize cpuset in pre-existing '%s'.", info_ptr->cgroup_path);
> > + goto cleanup_from_error;
> > + }
> >
> > /* already existed but path component of pattern didn't contain '%n',
> > * so this is not an error; but then we don't need current_entire_path
> > @@ -2039,8 +2048,19 @@
> > */
> > if (lxc_string_in_array("cpuset", (const char **)mp->hierarchy->subsystems)) {
> > char *cc_path = cgroup_to_absolute_path(mp, cgroup_path, "/cgroup.clone_children");
> > + struct stat sb;
> > +
> > if (!cc_path)
> > return -1;
> > + /* cgroup.clone_children is not available when running under
> > + * older kernel versions; in this case, we'll initialize
> > + * cpuset.cpus and cpuset.mems later, after the new cgroup
> > + * was created
> > + */
> > + if (stat(cc_path, &sb) != 0 && errno == ENOENT) {
> > + free(cc_path);
> > + return 0;
> > + }
> > r = lxc_read_from_file(cc_path, buf, 1);
> > if (r == 1 && buf[0] == '1') {
> > free(cc_path);
> > @@ -2055,6 +2075,90 @@
> > return 0;
> > }
> >
> > +static bool cgroup_read_from_file(const char *fn, char buf[], size_t bufsize)
> > +{
> > + int ret = lxc_read_from_file(fn, buf, bufsize);
> > + if (ret < 0) {
> > + SYSERROR("failed to read %s", fn);
> > + return false;
> > + }
> > + if (ret == bufsize) {
> > + ERROR("too much data in %s", fn);
> > + return false;
> > + }
> > + buf[ret] = '\0';
> > + return true;
> > +}
> > +
> > +static bool do_init_cpuset_file(struct cgroup_mount_point *mp,
> > + const char *path, const char *name)
> > +{
> > + char value[128];
>
> Is 128 enough? If some silly sod is running with just all the odd
> cpus on a large system the file could get pretty big right?
>
> > + char *childfile, *parentfile, *tmp;
> > + bool ok;
> > +
> > + childfile = cgroup_to_absolute_path(mp, path, name);
> > + if (!childfile)
> > + return false;
> > +
> > + /* don't overwrite a non-empty value in the file */
> > + if (!cgroup_read_from_file(childfile, value, sizeof(value))) {
> > + free(childfile);
> > + return false;
> > + }
> > + if (value[0] != '\0' && value[0] != '\n') {
> > + free(childfile);
> > + return true;
> > + }
> > +
> > + /* path to the same name in the parent cgroup */
> > + parentfile = strdup(path);
> > + if (!parentfile)
>
> childfile needs to be freed here?
>
> > + return false;
> > + tmp = strrchr(parentfile, '/');
> > + if (!tmp) {
> > + free(childfile);
> > + free(parentfile);
> > + return false;
> > + }
> > + if (tmp == parentfile)
> > + tmp++; /* keep the '/' at the start */
> > + *tmp = '\0';
> > + tmp = parentfile;
> > + parentfile = cgroup_to_absolute_path(mp, tmp, name);
> > + free(tmp);
> > + if (!parentfile) {
> > + free(childfile);
> > + return false;
> > + }
> > +
> > + /* copy from parent to child cgroup */
> > + if (!cgroup_read_from_file(parentfile, value, sizeof(value))) {
> > + free(parentfile);
> > + free(childfile);
> > + return false;
> > + }
> > + ok = (lxc_write_to_file(childfile, value, strlen(value), false) >= 0);
> > + if (!ok)
> > + SYSERROR("failed writing %s", childfile);
> > + free(parentfile);
> > + free(childfile);
> > +
> > + return ok;
> > +}
> > +
> > +static bool init_cpuset_if_needed(struct cgroup_mount_point *mp,
> > + const char *path)
> > +{
> > + /* the files we have to handle here are only in cpuset hierarchies */
> > + if (!lxc_string_in_array("cpuset",
> > + (const char **)mp->hierarchy->subsystems))
> > + return true;
> > +
> > + return (do_init_cpuset_file(mp, path, "/cpuset.cpus") &&
> > + do_init_cpuset_file(mp, path, "/cpuset.mems") );
> > +}
> > +
> > extern void lxc_monitor_send_state(const char *name, lxc_state_t state,
> > const char *lxcpath);
> > int do_unfreeze(int freeze, const char *name, const char *lxcpath)
> > _______________________________________________
> > lxc-devel mailing list
> > lxc-devel at lists.linuxcontainers.org
> > http://lists.linuxcontainers.org/listinfo/lxc-devel
> _______________________________________________
> lxc-devel mailing list
> lxc-devel at lists.linuxcontainers.org
> http://lists.linuxcontainers.org/listinfo/lxc-devel
More information about the lxc-devel
mailing list