[lxc-devel] [lxd/master] migration: fix a race for collecting logs

tych0 on Github lxc-bot at linuxcontainers.org
Mon Oct 17 17:11:09 UTC 2016


A non-text attachment was scrubbed...
Name: not available
Type: text/x-mailbox
Size: 1197 bytes
Desc: not available
URL: <http://lists.linuxcontainers.org/pipermail/lxc-devel/attachments/20161017/cd6a5298/attachment.bin>
-------------- next part --------------
From 26dd07627ddfaf22429c95140f6c7fa9b38aec2b Mon Sep 17 00:00:00 2001
From: Tycho Andersen <tycho.andersen at canonical.com>
Date: Mon, 17 Oct 2016 11:00:50 -0600
Subject: [PATCH] migration: fix a race for collecting logs

The problem here is that with the new action script method for waiting
until the restore is successful to finish the dump. The issue is that the
action script waits until the operation is done, but the operation deleted
the logs. Since the Migrate function is the one that actually collects the
logs, the logs got deleted before the Migrate function had a chance to
collect them.

Instead, let's make the goroutine that is calling Migrate responsible for
cleaning up the checkpoint directory, so that the logs are always
guaranteed to be collected. This means we have to be more explicit
elsewhere about deleting the logs in error cases, but we avoid the race.
In the legacy case, we can still just use the defer, since the Migrate
call returns immediately in the function itself.

Fixes a race condition seen in #2505

Signed-off-by: Tycho Andersen <tycho.andersen at canonical.com>
---
 lxd/migrate.go | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lxd/migrate.go b/lxd/migrate.go
index 2a10261..b0b3d68 100644
--- a/lxd/migrate.go
+++ b/lxd/migrate.go
@@ -390,7 +390,6 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error {
 		if err != nil {
 			return abort(err)
 		}
-		defer os.RemoveAll(checkpointDir)
 
 		if lxc.VersionAtLeast(2, 0, 4) {
 			/* What happens below is slightly convoluted. Due to various
@@ -412,6 +411,7 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error {
 			dumpDone := make(chan bool, 1)
 			actionScriptOpSecret, err := shared.RandomCryptoString()
 			if err != nil {
+				os.RemoveAll(checkpointDir)
 				return abort(err)
 			}
 
@@ -453,21 +453,25 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error {
 				},
 			)
 			if err != nil {
+				os.RemoveAll(checkpointDir)
 				return abort(err)
 			}
 
 			if err := writeActionScript(checkpointDir, actionScriptOp.url, actionScriptOpSecret); err != nil {
+				os.RemoveAll(checkpointDir)
 				return abort(err)
 			}
 
 			_, err = actionScriptOp.Run()
 			if err != nil {
+				os.RemoveAll(checkpointDir)
 				return abort(err)
 			}
 
 			migrateDone := make(chan error, 1)
 			go func() {
 				migrateDone <- s.container.Migrate(lxc.MIGRATE_DUMP, checkpointDir, "migration", true, true)
+				os.RemoveAll(checkpointDir)
 			}()
 
 			select {
@@ -479,6 +483,7 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error {
 				shared.LogDebugf("Dump finished, continuing with restore...")
 			}
 		} else {
+			defer os.RemoveAll(checkpointDir)
 			if err := s.container.Migrate(lxc.MIGRATE_DUMP, checkpointDir, "migration", true, false); err != nil {
 				return abort(err)
 			}


More information about the lxc-devel mailing list