[lxc-devel] [lxc/master] Propagate exit code for app containers

tych0 on Github lxc-bot at linuxcontainers.org
Fri Jan 19 03:33:49 UTC 2018


A non-text attachment was scrubbed...
Name: not available
Type: text/x-mailbox
Size: 584 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20180119/d0a2fdde/attachment.bin>
-------------- next part --------------
From 3a9e949f6d4cad085d645b2efbc126cdbc48773a Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho at tycho.ws>
Date: Fri, 19 Jan 2018 00:50:39 +0000
Subject: [PATCH 1/6] start: don't log stop/continue for non-init processes

This non-init forwarding check should really be before all the log messages
about "init continued" or "init stopped", since they will otherwise lie
about some process that wasn't init being stopped or continued.

Signed-off-by: Tycho Andersen <tycho at tycho.ws>
---
 src/lxc/start.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/lxc/start.c b/src/lxc/start.c
index 89a194fd1..8b4144239 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -323,6 +323,14 @@ static int signal_handler(int fd, uint32_t events, void *data,
 	if (ret == 0 && info.si_pid == hdlr->pid)
 		hdlr->init_died = true;
 
+	/* More robustness, protect ourself from a SIGCHLD sent
+	 * by a process different from the container init.
+	 */
+	if (siginfo.ssi_pid != hdlr->pid) {
+		NOTICE("Received %d from pid %d instead of container init %d.", siginfo.ssi_signo, siginfo.ssi_pid, hdlr->pid);
+		return hdlr->init_died ? LXC_MAINLOOP_CLOSE : 0;
+	}
+
 	if (siginfo.ssi_signo != SIGCHLD) {
 		kill(hdlr->pid, siginfo.ssi_signo);
 		INFO("Forwarded signal %d to pid %d.", siginfo.ssi_signo, hdlr->pid);
@@ -337,14 +345,6 @@ static int signal_handler(int fd, uint32_t events, void *data,
 		return hdlr->init_died ? LXC_MAINLOOP_CLOSE : 0;
 	}
 
-	/* More robustness, protect ourself from a SIGCHLD sent
-	 * by a process different from the container init.
-	 */
-	if (siginfo.ssi_pid != hdlr->pid) {
-		NOTICE("Received SIGCHLD from pid %d instead of container init %d.", siginfo.ssi_pid, hdlr->pid);
-		return hdlr->init_died ? LXC_MAINLOOP_CLOSE : 0;
-	}
-
 	DEBUG("Container init process %d exited.", hdlr->pid);
 	return LXC_MAINLOOP_CLOSE;
 }

From 19cfa02c4c7bd29cdf10462878d06d025fcc3d27 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho at tycho.ws>
Date: Fri, 19 Jan 2018 03:20:08 +0000
Subject: [PATCH 2/6] fix lxc_error_set_and_log to match the docs

The documentation for this function says if the task was killed by a
signal, the return code will be 128+n, where n is the signal number. Let's
make that actually true.

(We'll use this behavior in later patches.)

Signed-off-by: Tycho Andersen <tycho at tycho.ws>
---
 src/lxc/error.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/lxc/error.c b/src/lxc/error.c
index 81d6a376e..9147a6afb 100644
--- a/src/lxc/error.c
+++ b/src/lxc/error.c
@@ -52,6 +52,7 @@ extern int  lxc_error_set_and_log(int pid, int status)
 	if (WIFSIGNALED(status)) {
 		int signal = WTERMSIG(status);
 		INFO("Child <%d> ended on signal (%d)", pid, signal);
+		ret = 128 + signal;
 	}
 
 	return ret;

From 4f4530faa742e39fb0e0cd3d08de07f36e2b0fc8 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho at tycho.ws>
Date: Fri, 19 Jan 2018 03:21:10 +0000
Subject: [PATCH 3/6] lxc.init: correctly exit with the app's error code

Based on the comments in the code (and the have_status flag), the intent
here (and IMO, the desired behavior) should be for init.lxc to propagate
the actual exit code from the real application process up through.
Otherwise, it is swallowed and nobody can access it.

The bug being fixed here is that ret held the correct exit code, but when
it went around the loop again (to wait for other children) ret is
clobbered. Let's save the desired exit status somewhere else, so it can't
get clobbered, and we propagate things correctly.

Signed-off-by: Tycho Andersen <tycho at tycho.ws>
---
 src/lxc/lxc_init.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/lxc/lxc_init.c b/src/lxc/lxc_init.c
index 2879a9399..78811de4b 100644
--- a/src/lxc/lxc_init.c
+++ b/src/lxc/lxc_init.c
@@ -202,7 +202,7 @@ int main(int argc, char *argv[])
 	struct sigaction act;
 	struct lxc_log log;
 	sigset_t mask, omask;
-	int have_status = 0, shutdown = 0;
+	int have_status = 0, exit_with = 1, shutdown = 0;
 
 	if (arguments_parse(&my_args, argc, argv))
 		exit(EXIT_FAILURE);
@@ -420,14 +420,14 @@ int main(int argc, char *argv[])
 		 * pid) and continue to wait for the end of the orphan group.
 		 */
 		if (waited_pid == pid && !have_status) {
-			ret = lxc_error_set_and_log(waited_pid, status);
+			exit_with = lxc_error_set_and_log(waited_pid, status);
 			have_status = 1;
 		}
 	}
 out:
 	if (ret < 0)
 		exit(EXIT_FAILURE);
-	exit(ret);
+	exit(exit_with);
 }
 
 static void print_usage(const struct option longopts[])

From 44e1cdb0f1e60064edda21d62faf388a052431db Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho at tycho.ws>
Date: Fri, 19 Jan 2018 03:24:59 +0000
Subject: [PATCH 4/6] remember the exit code from the init process

error_num seems to be trying to remember the exit code of the init process,
except that nothing actually keeps track of it anywhere. So, let's add a
field to the handler, so that we can keep track of the process' exit
status, and the propagate it to error_num in struct lxc_container so that
people can use it.

Note that this is a slight behavior change, essentially instead of making
error_num always == the return code from start, now it contains slightly
more useful information (the actual exit status). But, there is only one
internal user of error_num which I'll fix in later in the series, so IMO
this is ok.

Signed-off-by: Tycho Andersen <tycho at tycho.ws>
---
 src/lxc/lxccontainer.c |  2 +-
 src/lxc/start.c        | 22 ++++++++++++++++++++++
 src/lxc/start.h        |  5 +++++
 3 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c
index 2a4bb51f3..f6f07bac1 100644
--- a/src/lxc/lxccontainer.c
+++ b/src/lxc/lxccontainer.c
@@ -1027,7 +1027,7 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a
 		ret = lxc_execute(c->name, argv, 1, handler, c->config_path, daemonize);
 	else
 		ret = lxc_start(c->name, argv, handler, c->config_path, daemonize);
-	c->error_num = ret;
+	c->error_num = handler->exit_status;
 
 	if (conf->reboot == 1) {
 		INFO("Container requested reboot");
diff --git a/src/lxc/start.c b/src/lxc/start.c
index 8b4144239..e61357f65 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -323,6 +323,28 @@ static int signal_handler(int fd, uint32_t events, void *data,
 	if (ret == 0 && info.si_pid == hdlr->pid)
 		hdlr->init_died = true;
 
+	/* Try to figure out a reasonable exit status to report. */
+	if (hdlr->init_died) {
+		switch (info.si_code) {
+		case CLD_EXITED:
+			ERROR("TYCHO: exit status: %d", info.si_status);
+			hdlr->exit_status = info.si_status << 8;
+			break;
+		case CLD_KILLED:
+		case CLD_DUMPED:
+		case CLD_STOPPED:
+			hdlr->exit_status = info.si_status << 8 | 0x7f;
+			break;
+		case CLD_CONTINUED:
+			/* Huh? The waitid() told us it's dead *and* continued? */
+			WARN("Init %d dead and continued?", hdlr->pid);
+			hdlr->exit_status = 1;
+			break;
+		default:
+			ERROR("Unknown si_code: %d", hdlr->init_died);
+		}
+	}
+
 	/* More robustness, protect ourself from a SIGCHLD sent
 	 * by a process different from the container init.
 	 */
diff --git a/src/lxc/start.h b/src/lxc/start.h
index 503154045..f2a2630e3 100644
--- a/src/lxc/start.h
+++ b/src/lxc/start.h
@@ -109,6 +109,11 @@ struct lxc_handler {
 
 	/* Current state of the container. */
 	lxc_state_t state;
+
+	/* The exit status of the container; not defined unless ->init_died ==
+	 * true.
+	 */
+	int exit_status;
 };
 
 struct lxc_operations {

From 3fad0ed9118202dc373e357998aa4c1c441ba882 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho at tycho.ws>
Date: Fri, 19 Jan 2018 03:29:05 +0000
Subject: [PATCH 5/6] start: don't return false when the container's init exits
 nonzero

This seems slightly counter-intuitive, but IMO it's what we want.
Basically, ->start() should succeed if the container is spawned correctly
(similar to how golang's exec.Cmd.Start() returns nil if the thing spawns
correctly), and users can check error_num (i.e. golang's exec.Cmd.Wait())
to see how it exited.

This preserves previous behavior, which basically was that start was always
successful if the thing actually launched. Since we never kept track of
exit codes, this would always succeed too. Now that we do, it doesn't, and
this change is required.

Signed-off-by: Tycho Andersen <tycho at tycho.ws>
---
 src/lxc/start.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lxc/start.c b/src/lxc/start.c
index e61357f65..2ae56afc3 100644
--- a/src/lxc/start.c
+++ b/src/lxc/start.c
@@ -1824,7 +1824,7 @@ int __lxc_start(const char *name, struct lxc_handler *handler,
 	}
 
 	lxc_monitor_send_exit_code(name, status, handler->lxcpath);
-	err =  lxc_error_set_and_log(handler->pid, status);
+	lxc_error_set_and_log(handler->pid, status);
 
 out_fini:
 	lxc_delete_network(handler);

From ae51997da55fa11e161b92d3d32d7760b2dc9ab9 Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho at tycho.ws>
Date: Fri, 19 Jan 2018 03:31:33 +0000
Subject: [PATCH 6/6] lxc-execute: actually exit with the status of the spawned
 task

Now that we have things propagated through init and liblxc correctly, at
least in non-daemon mode, we can exit with the actual exit status of the
task, instead of always succeeding, which is not so helpful.

Signed-off-by: Tycho Andersen <tycho at tycho.ws>
---
 src/lxc/tools/lxc_execute.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/src/lxc/tools/lxc_execute.c b/src/lxc/tools/lxc_execute.c
index dc1f504e7..98f846fc2 100644
--- a/src/lxc/tools/lxc_execute.c
+++ b/src/lxc/tools/lxc_execute.c
@@ -213,14 +213,20 @@ int main(int argc, char *argv[])
 
 	c->daemonize = my_args.daemonize == 1;
 	bret = c->start(c, 1, my_args.argv);
-	if (c->daemonize)
-		ret = EXIT_SUCCESS;
-	else
-		ret = c->error_num;
 	lxc_container_put(c);
 	if (!bret) {
 		fprintf(stderr, "Failed run an application inside container\n");
 		exit(EXIT_FAILURE);
 	}
-	exit(ret);
+	if (c->daemonize)
+		exit(EXIT_SUCCESS);
+	else {
+		if (WIFEXITED(c->error_num)) {
+			exit(WEXITSTATUS(c->error_num));
+		} else {
+			/* Try to die with the same signal the task did. */
+			kill(0, WTERMSIG(c->error_num));
+			exit(EXIT_FAILURE);
+		}
+	}
 }


More information about the lxc-devel mailing list