[lxc-devel] [PATCH v2 1/2] Add option to lxc-attach to select specific namespaces

Eric W. Biederman ebiederm at xmission.com
Tue May 22 15:00:23 UTC 2012


Serge Hallyn <serge.hallyn at canonical.com> writes:

> Quoting Christian Seiler (christian at iwakd.de):
>> This patch adds the -s/--namespaces option to lxc-attach that works
>> analogously to lxc-unshare, allowing the user to select the namespaces the
>> process should be attached to.
>> 
>> User namespaces are supported, under the assumption that the file in
>> /proc/pid/ns will be called 'usr'.
>
> Ok, I really do think it'll simply be called 'user'.  'uid' is not
> impossible, but 
>
> We could simply as Eric (cc'd) what he is planning on using :)

What I expect to call it is 'user', that is how we refer to the namespace.

I admit everything else is abbreviated but we haven't been abreviating
user so I don't see why we would start now.  

Eric


>> Currently, user namespaces will be
>> skipped (without having lxc-attach fail, unlike for other namespaces) if the
>> kernel lacks support.
>> 
>> Signed-off-by: Christian Seiler <christian at iwakd.de>
>> Cc: Stéphane Graber <stgraber at ubuntu.com>
>> Cc: Daniel Lezcano <daniel.lezcano at free.fr>
>> Cc: Serge Hallyn <serge.hallyn at canonical.com>
>
> One comment below.  With the change below (or without, if you feel
> strongly about it)
>
> Acked-by: Serge Hallyn <serge.hallyn at canonical.com>
>
> thanks,
> -serge
>
>> ---
>>  doc/lxc-attach.sgml.in |   99 +++++++++++++++++++++++++++++++++++++++++++++--
>>  src/lxc/attach.c       |   72 ++++++++++++++++++++++++++++++++--
>>  src/lxc/attach.h       |    2 +-
>>  src/lxc/lxc_attach.c   |   28 ++++++++++++-
>>  src/lxc/lxc_unshare.c  |   46 ----------------------
>>  src/lxc/namespace.c    |   46 ++++++++++++++++++++++
>>  src/lxc/namespace.h    |    3 +
>>  7 files changed, 236 insertions(+), 60 deletions(-)
>> 
>> diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
>> index 7092f16..d7fb223 100644
>> --- a/doc/lxc-attach.sgml.in
>> +++ b/doc/lxc-attach.sgml.in
>> @@ -49,7 +49,8 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>>    <refsynopsisdiv>
>>      <cmdsynopsis><command>lxc-attach <replaceable>-n
>>      name</replaceable> <optional>-a
>> -    arch</optional> <optional>-e</optional>
>> +    arch</optional> <optional>-e</optional> <optional>-s
>> +    namespaces</optional>
>>      <optional>-- command</optional></command></cmdsynopsis>
>>    </refsynopsisdiv>
>>  
>> @@ -122,6 +123,29 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>>  	</listitem>
>>        </varlistentry>
>>  
>> +      <varlistentry>
>> +	<term>
>> +	  <option>-s, --namespaces <replaceable>namespaces</replaceable></option>
>> +	</term>
>> +	<listitem>
>> +	  <para>
>> +	    Specify the namespaces to attach to, as a pipe-separated liste,
>> +	    e.g. <replaceable>NETWORK|IPC</replaceable>. Allowed values are
>> +	    <replaceable>MOUNT</replaceable>, <replaceable>PID</replaceable>,
>> +	    <replaceable>UTSNAME</replaceable>, <replaceable>IPC</replaceable>,
>> +	    <replaceable>USER </replaceable> and
>> +	    <replaceable>NETWORK</replaceable>. This allows one to change
>> +	    the context of the process to e.g. the network namespace of the
>> +	    container while retaining the other namespaces as those of the
>> +	    host.
>> +	  </para>
>> +	  <para>
>> +	    <emphasis>Important:</emphasis> This option implies
>> +	    <option>-e</option>.
>> +	  </para>
>> +	</listitem>
>> +      </varlistentry>
>> +
>>      </variablelist>
>>  
>>    </refsect1>
>> @@ -144,19 +168,84 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
>>        </para>
>>        <para>
>>          To deactivate the network link eth1 of a running container that
>> -        does not have the NET_ADMIN capability, use the <option>-e</option>
>> -        option to use increased capabilities:
>> +        does not have the NET_ADMIN capability, use either the
>> +        <option>-e</option> option to use increased capabilities,
>> +        assuming the <command>ip</command> tool is installed:
>>          <programlisting>
>>            lxc-attach -n container -e -- /sbin/ip link delete eth1
>>          </programlisting>
>> +        Or, alternatively, use the <option>-s</option> to use the
>> +        tools installed on the host outside the container:
>> +        <programlisting>
>> +          lxc-attach -n container -s NETWORK -- /sbin/ip link delete eth1
>> +        </programlisting>
>>        </para>
>>    </refsect1>
>>  
>>    <refsect1>
>> +    <title>Compatibility</title>
>> +    <para>
>> +      Attaching completely (including the pid and mount namespaces) to a
>> +      container requires a patched kernel, please see the lxc website for
>> +      details. <command>lxc-attach</command> will fail in that case if
>> +      used with an unpatched kernel.
>> +    </para>
>> +    <para>
>> +      Nevertheless, it will succeed on an unpatched kernel of version 3.0
>> +      or higher if the <option>-s</option> option is used to restrict the
>> +      namespaces that the process is to be attached to to one or more of 
>> +      <replaceable>NETWORK</replaceable>, <replaceable>IPC</replaceable>
>> +      and <replaceable>UTSNAME</replaceable>.
>> +    </para>
>> +    <para>
>> +      Attaching to user namespaces is currently completely unsupported
>> +      by the kernel. User namespaces will be skipped (but will not cause
>> +      <command>lxc-attach</command> to fail) unless used with a future
>> +      version of the kernel that supports this.
>> +    </para>
>> +  </refsect1>
>> +
>> +  <refsect1>
>> +    <title>Notes</title>
>> +    <para>
>> +      The Linux <replaceable>/proc</replaceable> and
>> +      <replaceable>/sys</replaceable> filesystems contain information
>> +      about some quantities that are affected by namespaces, such as
>> +      the directories named after process ids in
>> +      <replaceable>/proc</replaceable> or the network interface infromation
>> +      in <replaceable>/sys/class/net</replaceable>. The namespace of the
>> +      process mounting the pseudo-filesystems determines what information
>> +      is shown, <emphasis>not</emphasis> the namespace of the process
>> +      accessing <replaceable>/proc</replaceable> or
>> +      <replaceable>/sys</replaceable>.
>> +    </para>
>> +    <para>
>> +      If one uses the <option>-s</option> option to only attach to
>> +      the pid namespace of a container, but not its mount namespace
>> +      (which will contain the <replaceable>/proc</replaceable> of the
>> +      container and not the host), the contents of <option>/proc</option>
>> +      will reflect that of the host and not the container. Analogously,
>> +      the same issue occurs when reading the contents of
>> +      <replaceable>/sys/class/net</replaceable> and attaching to just
>> +      the network namespace.
>> +    </para>
>> +    <para>
>> +      A workaround is to use <command>lxc-unshare</command> to unshare
>> +      the mount namespace after using <command>lxc-attach</command> with
>> +      <replaceable>-s PID</replaceable> and/or <replaceable>-s
>> +      NETWORK</replaceable> and then unmount and then mount again both
>> +      pseudo-filesystems within that new mount namespace, before
>> +      executing a program/script that relies on this information to be
>> +      correct.
>> +    </para>
>> +  </refsect1>
>> +
>> +  <refsect1>
>>      <title>Security</title>
>>      <para>
>> -      The <option>-e</option> should be used with care, as it may break
>> -      the isolation of the containers if used improperly.
>> +      The <option>-e</option> and <option>-s</option> options should
>> +      be used with care, as it may break the isolation of the containers
>> +      if used improperly.
>>      </para>
>>    </refsect1>
>>  
>> diff --git a/src/lxc/attach.c b/src/lxc/attach.c
>> index a95b3d3..9d598f0 100644
>> --- a/src/lxc/attach.c
>> +++ b/src/lxc/attach.c
>> @@ -121,13 +121,23 @@ out_error:
>>  	return NULL;
>>  }
>>  
>> -int lxc_attach_to_ns(pid_t pid)
>> +int lxc_attach_to_ns(pid_t pid, int which)
>>  {
>>  	char path[MAXPATHLEN];
>> -	char *ns[] = { "pid", "mnt", "net", "ipc", "uts" };
>> -	const int size = sizeof(ns) / sizeof(char *);
>> +	/* TODO: we assume that the file in /proc for attaching to user
>> +	 * namespaces will be called /proc/$pid/ns/usr, in accordance
>> +	 * with the naming convention of previous namespaces. Once the
>> +	 * kernel really supports setns() on a user namespace, make sure
>> +	 * the array here matches the array in the kernel
>> +	 */
>> +	static char *ns[] = { "mnt", "pid", "uts", "ipc", "usr", "net" };
>> +	static int flags[] = {
>> +		CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
>> +		CLONE_NEWUSER, CLONE_NEWNET
>> +	};
>> +	static const int size = sizeof(ns) / sizeof(char *);
>>  	int fd[size];
>> -	int i;
>> +	int i, j, saved_errno;
>>  
>>  	snprintf(path, MAXPATHLEN, "/proc/%d/ns", pid);
>>  	if (access(path, X_OK)) {
>> @@ -136,17 +146,69 @@ int lxc_attach_to_ns(pid_t pid)
>>  	}
>>  
>>  	for (i = 0; i < size; i++) {
>> +		/* ignore if we are not supposed to attach
>> +		 * to that namespace
>> +		 */
>> +		if (which != -1 && !(which & flags[i])) {
>> +			fd[i] = -1;
>> +			continue;
>> +		}
>>  		snprintf(path, MAXPATHLEN, "/proc/%d/ns/%s", pid, ns[i]);
>>  		fd[i] = open(path, O_RDONLY);
>>  		if (fd[i] < 0) {
>> +			/* there is currently no support in the kernel for
>> +			 * attaching to user namespaces - therefore, we
>> +			 * ignore the error, if the file does not exist
>> +			 */
>> +			if (flags[i] == CLONE_NEWUSER && errno == ENOENT) {
>
> Note that for now the same thing will happen with pid.  I don't think
> CLONE_NEWUSER needs to be special cased.  Likewise, someone may want
> to use this lxc on an older kernel without any setns support at all.
>
> Your choices for behavior are good (print a msg for which == -1,
> and error out if the namespace was specially chosen), but I think
> you should simply do it for all namespaces.
>
>> +				if (which != -1) {
>> +					/* we don't want the error
>> +					 * message on every full attach,
>> +					 * so we only show it if the
>> +					 * user really requested it
>> +					 * explicitly
>> +					 */
>> +					ERROR("Kernel does not support "
>> +					      "attaching to user "
>> +					      "namespaces, skipping.");
>> +				} else {
>> +					/* but do show it as a debug
>> +					 * message otherwise, so users
>> +					 * aren't completely left in the
>> +					 * dark
>> +					 */
>> +					DEBUG("Kernel does not support "
>> +					      "attaching to user "
>> +					      "namespaces, skipping.");
>> +				}
>> +				fd[i] = -1;
>> +				continue;
>> +			}
>> +
>> +			saved_errno = errno;
>> +
>> +			/* close all already opened files before we return
>> +			 * an error, so we don't leak file descriptors if
>> +			 * the caller decides to continue nontheless
>> +			 */
>> +			for (j = 0; j < i; j++)
>> +				close(fd[j]);
>> +
>>  			SYSERROR("failed to open '%s'", path);
>> +			errno = saved_errno;
>>  			return -1;
>>  		}
>>  	}
>>  
>>  	for (i = 0; i < size; i++) {
>> -		if (setns(fd[i], 0)) {
>> +		if (fd[i] >= 0 && setns(fd[i], 0)) {
>> +			saved_errno = errno;
>> +
>> +			for (j = i; j < size; j++)
>> +				close(fd[j]);
>> +
>>  			SYSERROR("failed to set namespace '%s'", ns[i]);
>> +			errno = saved_errno;
>>  			return -1;
>>  		}
>>  
>> diff --git a/src/lxc/attach.h b/src/lxc/attach.h
>> index 2d46c83..d96fdae 100644
>> --- a/src/lxc/attach.h
>> +++ b/src/lxc/attach.h
>> @@ -33,7 +33,7 @@ struct lxc_proc_context_info {
>>  
>>  extern struct lxc_proc_context_info *lxc_proc_get_context_info(pid_t pid);
>>  
>> -extern int lxc_attach_to_ns(pid_t other_pid);
>> +extern int lxc_attach_to_ns(pid_t other_pid, int which);
>>  extern int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx);
>>  
>>  #endif
>> diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
>> index 955e9f4..6b98248 100644
>> --- a/src/lxc/lxc_attach.c
>> +++ b/src/lxc/lxc_attach.c
>> @@ -40,20 +40,24 @@
>>  #include "start.h"
>>  #include "sync.h"
>>  #include "log.h"
>> +#include "namespace.h"
>>  
>>  lxc_log_define(lxc_attach_ui, lxc);
>>  
>>  static const struct option my_longopts[] = {
>>  	{"elevated-privileges", no_argument, 0, 'e'},
>>  	{"arch", required_argument, 0, 'a'},
>> +	{"namespaces", required_argument, 0, 's'},
>>  	LXC_COMMON_OPTIONS
>>  };
>>  
>>  static int elevated_privileges = 0;
>>  static signed long new_personality = -1;
>> +static int namespace_flags = -1;
>>  
>>  static int my_parser(struct lxc_arguments* args, int c, char* arg)
>>  {
>> +	int ret;
>>  	switch (c) {
>>  	case 'e': elevated_privileges = 1; break;
>>  	case 'a':
>> @@ -63,6 +67,12 @@ static int my_parser(struct lxc_arguments* args, int c, char* arg)
>>  			return -1;
>>  		}
>>  		break;
>> +	case 's':
>> +		namespace_flags = 0;
>> +		ret = lxc_fill_namespace_flags(arg, &namespace_flags);
>> +		if (ret)
>> +			return -1;
>> +		break;
>>  	}
>>  
>>  	return 0;
>> @@ -83,7 +93,13 @@ Options :\n\
>>                      WARNING: This may leak privleges into the container.\n\
>>                      Use with care.\n\
>>    -a, --arch=ARCH   Use ARCH for program instead of container's own\n\
>> -                    architecture.\n",
>> +                    architecture.\n\
>> +  -s, --namespaces=FLAGS\n\
>> +                    Don't attach to all the namespaces of the container\n\
>> +                    but just to the following OR'd list of flags:\n\
>> +                    MOUNT, PID, UTSNAME, IPC, USER or NETWORK\n\
>> +                    WARNING: Using -s implies -e, it may therefore\n\
>> +                    leak privileges into the container. Use with care.",
>>  	.options  = my_longopts,
>>  	.parser   = my_parser,
>>  	.checker  = NULL,
>> @@ -111,7 +127,13 @@ int main(int argc, char *argv[])
>>  			   my_args.progname, my_args.quiet);
>>  	if (ret)
>>  		return ret;
>> -
>> +	
>> +	/* if we do not attach to all namespaces, we will assume
>> +	 * elevated privileges by default anyway.
>> +	 */
>> +	if (namespace_flags != -1)
>> +		elevated_privileges = 1;
>> +	
>>  	init_pid = get_init_pid(my_args.name);
>>  	if (init_pid < 0) {
>>  		ERROR("failed to get the init pid");
>> @@ -178,7 +200,7 @@ int main(int argc, char *argv[])
>>  
>>  		curdir = get_current_dir_name();
>>  
>> -		ret = lxc_attach_to_ns(init_pid);
>> +		ret = lxc_attach_to_ns(init_pid, namespace_flags);
>>  		if (ret < 0) {
>>  			ERROR("failed to enter the namespace");
>>  			return -1;
>> diff --git a/src/lxc/lxc_unshare.c b/src/lxc/lxc_unshare.c
>> index 0baccb0..9d8c8ca 100644
>> --- a/src/lxc/lxc_unshare.c
>> +++ b/src/lxc/lxc_unshare.c
>> @@ -85,52 +85,6 @@ static uid_t lookup_user(const char *optarg)
>>  	return uid;
>>  }
>>  
>> -static char *namespaces_list[] = {
>> -	"MOUNT", "PID", "UTSNAME", "IPC",
>> -	"USER", "NETWORK"
>> -};
>> -static int cloneflags_list[] = {
>> -	CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
>> -	CLONE_NEWUSER, CLONE_NEWNET
>> -};
>> -
>> -static int lxc_namespace_2_cloneflag(char *namespace)
>> -{
>> -	int i, len;
>> -	len = sizeof(namespaces_list)/sizeof(namespaces_list[0]);
>> -	for (i = 0; i < len; i++)
>> -		if (!strcmp(namespaces_list[i], namespace))
>> -			return cloneflags_list[i];
>> -
>> -	ERROR("invalid namespace name %s", namespace);
>> -	return -1;
>> -}
>> -
>> -static int lxc_fill_namespace_flags(char *flaglist, int *flags)
>> -{
>> -	char *token, *saveptr = NULL;
>> -	int aflag;
>> -
>> -	if (!flaglist) {
>> -		ERROR("need at least one namespace to unshare");
>> -		return -1;
>> -	}
>> -
>> -	token = strtok_r(flaglist, "|", &saveptr);
>> -	while (token) {
>> -
>> -		aflag = lxc_namespace_2_cloneflag(token);
>> -		if (aflag < 0)
>> -			return -1;
>> -
>> -		*flags |= aflag;
>> -
>> -		token = strtok_r(NULL, "|", &saveptr);
>> -	}
>> -	return 0;
>> -}
>> -
>> -
>>  struct start_arg {
>>  	char ***args;
>>  	int *flags;
>> diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c
>> index 3e6fc3a..e3c7a09 100644
>> --- a/src/lxc/namespace.c
>> +++ b/src/lxc/namespace.c
>> @@ -69,3 +69,49 @@ pid_t lxc_clone(int (*fn)(void *), void *arg, int flags)
>>  
>>  	return ret;
>>  }
>> +
>> +static char *namespaces_list[] = {
>> +	"MOUNT", "PID", "UTSNAME", "IPC",
>> +	"USER", "NETWORK"
>> +};
>> +static int cloneflags_list[] = {
>> +	CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
>> +	CLONE_NEWUSER, CLONE_NEWNET
>> +};
>> +
>> +int lxc_namespace_2_cloneflag(char *namespace)
>> +{
>> +	int i, len;
>> +	len = sizeof(namespaces_list)/sizeof(namespaces_list[0]);
>> +	for (i = 0; i < len; i++)
>> +		if (!strcmp(namespaces_list[i], namespace))
>> +			return cloneflags_list[i];
>> +
>> +	ERROR("invalid namespace name %s", namespace);
>> +	return -1;
>> +}
>> +
>> +int lxc_fill_namespace_flags(char *flaglist, int *flags)
>> +{
>> +	char *token, *saveptr = NULL;
>> +	int aflag;
>> +
>> +	if (!flaglist) {
>> +		ERROR("need at least one namespace to unshare/attach");
>> +		return -1;
>> +	}
>> +
>> +	token = strtok_r(flaglist, "|", &saveptr);
>> +	while (token) {
>> +
>> +		aflag = lxc_namespace_2_cloneflag(token);
>> +		if (aflag < 0)
>> +			return -1;
>> +
>> +		*flags |= aflag;
>> +
>> +		token = strtok_r(NULL, "|", &saveptr);
>> +	}
>> +	return 0;
>> +}
>> +
>> diff --git a/src/lxc/namespace.h b/src/lxc/namespace.h
>> index 5442dd3..04e81bb 100644
>> --- a/src/lxc/namespace.h
>> +++ b/src/lxc/namespace.h
>> @@ -50,4 +50,7 @@
>>  
>>  extern pid_t lxc_clone(int (*fn)(void *), void *arg, int flags);
>>  
>> +extern int lxc_namespace_2_cloneflag(char *namespace);
>> +extern int lxc_fill_namespace_flags(char *flaglist, int *flags);
>> +
>>  #endif
>> -- 
>> 1.7.2.5
>> 




More information about the lxc-devel mailing list