[lxc-devel] [PATCH v2 1/2] Add option to lxc-attach to select specific namespaces
Serge Hallyn
serge.hallyn at canonical.com
Tue May 22 13:10:09 UTC 2012
Quoting Christian Seiler (christian at iwakd.de):
> This patch adds the -s/--namespaces option to lxc-attach that works
> analogously to lxc-unshare, allowing the user to select the namespaces the
> process should be attached to.
>
> User namespaces are supported, under the assumption that the file in
> /proc/pid/ns will be called 'usr'.
Ok, I really do think it'll simply be called 'user'. 'uid' is not
impossible, but
We could simply as Eric (cc'd) what he is planning on using :)
> Currently, user namespaces will be
> skipped (without having lxc-attach fail, unlike for other namespaces) if the
> kernel lacks support.
>
> Signed-off-by: Christian Seiler <christian at iwakd.de>
> Cc: Stéphane Graber <stgraber at ubuntu.com>
> Cc: Daniel Lezcano <daniel.lezcano at free.fr>
> Cc: Serge Hallyn <serge.hallyn at canonical.com>
One comment below. With the change below (or without, if you feel
strongly about it)
Acked-by: Serge Hallyn <serge.hallyn at canonical.com>
thanks,
-serge
> ---
> doc/lxc-attach.sgml.in | 99 +++++++++++++++++++++++++++++++++++++++++++++--
> src/lxc/attach.c | 72 ++++++++++++++++++++++++++++++++--
> src/lxc/attach.h | 2 +-
> src/lxc/lxc_attach.c | 28 ++++++++++++-
> src/lxc/lxc_unshare.c | 46 ----------------------
> src/lxc/namespace.c | 46 ++++++++++++++++++++++
> src/lxc/namespace.h | 3 +
> 7 files changed, 236 insertions(+), 60 deletions(-)
>
> diff --git a/doc/lxc-attach.sgml.in b/doc/lxc-attach.sgml.in
> index 7092f16..d7fb223 100644
> --- a/doc/lxc-attach.sgml.in
> +++ b/doc/lxc-attach.sgml.in
> @@ -49,7 +49,8 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> <refsynopsisdiv>
> <cmdsynopsis><command>lxc-attach <replaceable>-n
> name</replaceable> <optional>-a
> - arch</optional> <optional>-e</optional>
> + arch</optional> <optional>-e</optional> <optional>-s
> + namespaces</optional>
> <optional>-- command</optional></command></cmdsynopsis>
> </refsynopsisdiv>
>
> @@ -122,6 +123,29 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> </listitem>
> </varlistentry>
>
> + <varlistentry>
> + <term>
> + <option>-s, --namespaces <replaceable>namespaces</replaceable></option>
> + </term>
> + <listitem>
> + <para>
> + Specify the namespaces to attach to, as a pipe-separated liste,
> + e.g. <replaceable>NETWORK|IPC</replaceable>. Allowed values are
> + <replaceable>MOUNT</replaceable>, <replaceable>PID</replaceable>,
> + <replaceable>UTSNAME</replaceable>, <replaceable>IPC</replaceable>,
> + <replaceable>USER </replaceable> and
> + <replaceable>NETWORK</replaceable>. This allows one to change
> + the context of the process to e.g. the network namespace of the
> + container while retaining the other namespaces as those of the
> + host.
> + </para>
> + <para>
> + <emphasis>Important:</emphasis> This option implies
> + <option>-e</option>.
> + </para>
> + </listitem>
> + </varlistentry>
> +
> </variablelist>
>
> </refsect1>
> @@ -144,19 +168,84 @@ Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
> </para>
> <para>
> To deactivate the network link eth1 of a running container that
> - does not have the NET_ADMIN capability, use the <option>-e</option>
> - option to use increased capabilities:
> + does not have the NET_ADMIN capability, use either the
> + <option>-e</option> option to use increased capabilities,
> + assuming the <command>ip</command> tool is installed:
> <programlisting>
> lxc-attach -n container -e -- /sbin/ip link delete eth1
> </programlisting>
> + Or, alternatively, use the <option>-s</option> to use the
> + tools installed on the host outside the container:
> + <programlisting>
> + lxc-attach -n container -s NETWORK -- /sbin/ip link delete eth1
> + </programlisting>
> </para>
> </refsect1>
>
> <refsect1>
> + <title>Compatibility</title>
> + <para>
> + Attaching completely (including the pid and mount namespaces) to a
> + container requires a patched kernel, please see the lxc website for
> + details. <command>lxc-attach</command> will fail in that case if
> + used with an unpatched kernel.
> + </para>
> + <para>
> + Nevertheless, it will succeed on an unpatched kernel of version 3.0
> + or higher if the <option>-s</option> option is used to restrict the
> + namespaces that the process is to be attached to to one or more of
> + <replaceable>NETWORK</replaceable>, <replaceable>IPC</replaceable>
> + and <replaceable>UTSNAME</replaceable>.
> + </para>
> + <para>
> + Attaching to user namespaces is currently completely unsupported
> + by the kernel. User namespaces will be skipped (but will not cause
> + <command>lxc-attach</command> to fail) unless used with a future
> + version of the kernel that supports this.
> + </para>
> + </refsect1>
> +
> + <refsect1>
> + <title>Notes</title>
> + <para>
> + The Linux <replaceable>/proc</replaceable> and
> + <replaceable>/sys</replaceable> filesystems contain information
> + about some quantities that are affected by namespaces, such as
> + the directories named after process ids in
> + <replaceable>/proc</replaceable> or the network interface infromation
> + in <replaceable>/sys/class/net</replaceable>. The namespace of the
> + process mounting the pseudo-filesystems determines what information
> + is shown, <emphasis>not</emphasis> the namespace of the process
> + accessing <replaceable>/proc</replaceable> or
> + <replaceable>/sys</replaceable>.
> + </para>
> + <para>
> + If one uses the <option>-s</option> option to only attach to
> + the pid namespace of a container, but not its mount namespace
> + (which will contain the <replaceable>/proc</replaceable> of the
> + container and not the host), the contents of <option>/proc</option>
> + will reflect that of the host and not the container. Analogously,
> + the same issue occurs when reading the contents of
> + <replaceable>/sys/class/net</replaceable> and attaching to just
> + the network namespace.
> + </para>
> + <para>
> + A workaround is to use <command>lxc-unshare</command> to unshare
> + the mount namespace after using <command>lxc-attach</command> with
> + <replaceable>-s PID</replaceable> and/or <replaceable>-s
> + NETWORK</replaceable> and then unmount and then mount again both
> + pseudo-filesystems within that new mount namespace, before
> + executing a program/script that relies on this information to be
> + correct.
> + </para>
> + </refsect1>
> +
> + <refsect1>
> <title>Security</title>
> <para>
> - The <option>-e</option> should be used with care, as it may break
> - the isolation of the containers if used improperly.
> + The <option>-e</option> and <option>-s</option> options should
> + be used with care, as it may break the isolation of the containers
> + if used improperly.
> </para>
> </refsect1>
>
> diff --git a/src/lxc/attach.c b/src/lxc/attach.c
> index a95b3d3..9d598f0 100644
> --- a/src/lxc/attach.c
> +++ b/src/lxc/attach.c
> @@ -121,13 +121,23 @@ out_error:
> return NULL;
> }
>
> -int lxc_attach_to_ns(pid_t pid)
> +int lxc_attach_to_ns(pid_t pid, int which)
> {
> char path[MAXPATHLEN];
> - char *ns[] = { "pid", "mnt", "net", "ipc", "uts" };
> - const int size = sizeof(ns) / sizeof(char *);
> + /* TODO: we assume that the file in /proc for attaching to user
> + * namespaces will be called /proc/$pid/ns/usr, in accordance
> + * with the naming convention of previous namespaces. Once the
> + * kernel really supports setns() on a user namespace, make sure
> + * the array here matches the array in the kernel
> + */
> + static char *ns[] = { "mnt", "pid", "uts", "ipc", "usr", "net" };
> + static int flags[] = {
> + CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
> + CLONE_NEWUSER, CLONE_NEWNET
> + };
> + static const int size = sizeof(ns) / sizeof(char *);
> int fd[size];
> - int i;
> + int i, j, saved_errno;
>
> snprintf(path, MAXPATHLEN, "/proc/%d/ns", pid);
> if (access(path, X_OK)) {
> @@ -136,17 +146,69 @@ int lxc_attach_to_ns(pid_t pid)
> }
>
> for (i = 0; i < size; i++) {
> + /* ignore if we are not supposed to attach
> + * to that namespace
> + */
> + if (which != -1 && !(which & flags[i])) {
> + fd[i] = -1;
> + continue;
> + }
> snprintf(path, MAXPATHLEN, "/proc/%d/ns/%s", pid, ns[i]);
> fd[i] = open(path, O_RDONLY);
> if (fd[i] < 0) {
> + /* there is currently no support in the kernel for
> + * attaching to user namespaces - therefore, we
> + * ignore the error, if the file does not exist
> + */
> + if (flags[i] == CLONE_NEWUSER && errno == ENOENT) {
Note that for now the same thing will happen with pid. I don't think
CLONE_NEWUSER needs to be special cased. Likewise, someone may want
to use this lxc on an older kernel without any setns support at all.
Your choices for behavior are good (print a msg for which == -1,
and error out if the namespace was specially chosen), but I think
you should simply do it for all namespaces.
> + if (which != -1) {
> + /* we don't want the error
> + * message on every full attach,
> + * so we only show it if the
> + * user really requested it
> + * explicitly
> + */
> + ERROR("Kernel does not support "
> + "attaching to user "
> + "namespaces, skipping.");
> + } else {
> + /* but do show it as a debug
> + * message otherwise, so users
> + * aren't completely left in the
> + * dark
> + */
> + DEBUG("Kernel does not support "
> + "attaching to user "
> + "namespaces, skipping.");
> + }
> + fd[i] = -1;
> + continue;
> + }
> +
> + saved_errno = errno;
> +
> + /* close all already opened files before we return
> + * an error, so we don't leak file descriptors if
> + * the caller decides to continue nontheless
> + */
> + for (j = 0; j < i; j++)
> + close(fd[j]);
> +
> SYSERROR("failed to open '%s'", path);
> + errno = saved_errno;
> return -1;
> }
> }
>
> for (i = 0; i < size; i++) {
> - if (setns(fd[i], 0)) {
> + if (fd[i] >= 0 && setns(fd[i], 0)) {
> + saved_errno = errno;
> +
> + for (j = i; j < size; j++)
> + close(fd[j]);
> +
> SYSERROR("failed to set namespace '%s'", ns[i]);
> + errno = saved_errno;
> return -1;
> }
>
> diff --git a/src/lxc/attach.h b/src/lxc/attach.h
> index 2d46c83..d96fdae 100644
> --- a/src/lxc/attach.h
> +++ b/src/lxc/attach.h
> @@ -33,7 +33,7 @@ struct lxc_proc_context_info {
>
> extern struct lxc_proc_context_info *lxc_proc_get_context_info(pid_t pid);
>
> -extern int lxc_attach_to_ns(pid_t other_pid);
> +extern int lxc_attach_to_ns(pid_t other_pid, int which);
> extern int lxc_attach_drop_privs(struct lxc_proc_context_info *ctx);
>
> #endif
> diff --git a/src/lxc/lxc_attach.c b/src/lxc/lxc_attach.c
> index 955e9f4..6b98248 100644
> --- a/src/lxc/lxc_attach.c
> +++ b/src/lxc/lxc_attach.c
> @@ -40,20 +40,24 @@
> #include "start.h"
> #include "sync.h"
> #include "log.h"
> +#include "namespace.h"
>
> lxc_log_define(lxc_attach_ui, lxc);
>
> static const struct option my_longopts[] = {
> {"elevated-privileges", no_argument, 0, 'e'},
> {"arch", required_argument, 0, 'a'},
> + {"namespaces", required_argument, 0, 's'},
> LXC_COMMON_OPTIONS
> };
>
> static int elevated_privileges = 0;
> static signed long new_personality = -1;
> +static int namespace_flags = -1;
>
> static int my_parser(struct lxc_arguments* args, int c, char* arg)
> {
> + int ret;
> switch (c) {
> case 'e': elevated_privileges = 1; break;
> case 'a':
> @@ -63,6 +67,12 @@ static int my_parser(struct lxc_arguments* args, int c, char* arg)
> return -1;
> }
> break;
> + case 's':
> + namespace_flags = 0;
> + ret = lxc_fill_namespace_flags(arg, &namespace_flags);
> + if (ret)
> + return -1;
> + break;
> }
>
> return 0;
> @@ -83,7 +93,13 @@ Options :\n\
> WARNING: This may leak privleges into the container.\n\
> Use with care.\n\
> -a, --arch=ARCH Use ARCH for program instead of container's own\n\
> - architecture.\n",
> + architecture.\n\
> + -s, --namespaces=FLAGS\n\
> + Don't attach to all the namespaces of the container\n\
> + but just to the following OR'd list of flags:\n\
> + MOUNT, PID, UTSNAME, IPC, USER or NETWORK\n\
> + WARNING: Using -s implies -e, it may therefore\n\
> + leak privileges into the container. Use with care.",
> .options = my_longopts,
> .parser = my_parser,
> .checker = NULL,
> @@ -111,7 +127,13 @@ int main(int argc, char *argv[])
> my_args.progname, my_args.quiet);
> if (ret)
> return ret;
> -
> +
> + /* if we do not attach to all namespaces, we will assume
> + * elevated privileges by default anyway.
> + */
> + if (namespace_flags != -1)
> + elevated_privileges = 1;
> +
> init_pid = get_init_pid(my_args.name);
> if (init_pid < 0) {
> ERROR("failed to get the init pid");
> @@ -178,7 +200,7 @@ int main(int argc, char *argv[])
>
> curdir = get_current_dir_name();
>
> - ret = lxc_attach_to_ns(init_pid);
> + ret = lxc_attach_to_ns(init_pid, namespace_flags);
> if (ret < 0) {
> ERROR("failed to enter the namespace");
> return -1;
> diff --git a/src/lxc/lxc_unshare.c b/src/lxc/lxc_unshare.c
> index 0baccb0..9d8c8ca 100644
> --- a/src/lxc/lxc_unshare.c
> +++ b/src/lxc/lxc_unshare.c
> @@ -85,52 +85,6 @@ static uid_t lookup_user(const char *optarg)
> return uid;
> }
>
> -static char *namespaces_list[] = {
> - "MOUNT", "PID", "UTSNAME", "IPC",
> - "USER", "NETWORK"
> -};
> -static int cloneflags_list[] = {
> - CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
> - CLONE_NEWUSER, CLONE_NEWNET
> -};
> -
> -static int lxc_namespace_2_cloneflag(char *namespace)
> -{
> - int i, len;
> - len = sizeof(namespaces_list)/sizeof(namespaces_list[0]);
> - for (i = 0; i < len; i++)
> - if (!strcmp(namespaces_list[i], namespace))
> - return cloneflags_list[i];
> -
> - ERROR("invalid namespace name %s", namespace);
> - return -1;
> -}
> -
> -static int lxc_fill_namespace_flags(char *flaglist, int *flags)
> -{
> - char *token, *saveptr = NULL;
> - int aflag;
> -
> - if (!flaglist) {
> - ERROR("need at least one namespace to unshare");
> - return -1;
> - }
> -
> - token = strtok_r(flaglist, "|", &saveptr);
> - while (token) {
> -
> - aflag = lxc_namespace_2_cloneflag(token);
> - if (aflag < 0)
> - return -1;
> -
> - *flags |= aflag;
> -
> - token = strtok_r(NULL, "|", &saveptr);
> - }
> - return 0;
> -}
> -
> -
> struct start_arg {
> char ***args;
> int *flags;
> diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c
> index 3e6fc3a..e3c7a09 100644
> --- a/src/lxc/namespace.c
> +++ b/src/lxc/namespace.c
> @@ -69,3 +69,49 @@ pid_t lxc_clone(int (*fn)(void *), void *arg, int flags)
>
> return ret;
> }
> +
> +static char *namespaces_list[] = {
> + "MOUNT", "PID", "UTSNAME", "IPC",
> + "USER", "NETWORK"
> +};
> +static int cloneflags_list[] = {
> + CLONE_NEWNS, CLONE_NEWPID, CLONE_NEWUTS, CLONE_NEWIPC,
> + CLONE_NEWUSER, CLONE_NEWNET
> +};
> +
> +int lxc_namespace_2_cloneflag(char *namespace)
> +{
> + int i, len;
> + len = sizeof(namespaces_list)/sizeof(namespaces_list[0]);
> + for (i = 0; i < len; i++)
> + if (!strcmp(namespaces_list[i], namespace))
> + return cloneflags_list[i];
> +
> + ERROR("invalid namespace name %s", namespace);
> + return -1;
> +}
> +
> +int lxc_fill_namespace_flags(char *flaglist, int *flags)
> +{
> + char *token, *saveptr = NULL;
> + int aflag;
> +
> + if (!flaglist) {
> + ERROR("need at least one namespace to unshare/attach");
> + return -1;
> + }
> +
> + token = strtok_r(flaglist, "|", &saveptr);
> + while (token) {
> +
> + aflag = lxc_namespace_2_cloneflag(token);
> + if (aflag < 0)
> + return -1;
> +
> + *flags |= aflag;
> +
> + token = strtok_r(NULL, "|", &saveptr);
> + }
> + return 0;
> +}
> +
> diff --git a/src/lxc/namespace.h b/src/lxc/namespace.h
> index 5442dd3..04e81bb 100644
> --- a/src/lxc/namespace.h
> +++ b/src/lxc/namespace.h
> @@ -50,4 +50,7 @@
>
> extern pid_t lxc_clone(int (*fn)(void *), void *arg, int flags);
>
> +extern int lxc_namespace_2_cloneflag(char *namespace);
> +extern int lxc_fill_namespace_flags(char *flaglist, int *flags);
> +
> #endif
> --
> 1.7.2.5
>
More information about the lxc-devel
mailing list