From lxc-bot at linuxcontainers.org Wed May 1 00:05:21 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Tue, 30 Apr 2019 17:05:21 -0700 (PDT) Subject: [lxc-devel] [lxc/master] seccomp: remove alignment requirements Message-ID: <5cc8e2c1.1c69fb81.84975.eddeSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 487 bytes Desc: not available URL: -------------- next part -------------- From 2a621eceddce438392be2a691abc36116defcb18 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 1 May 2019 02:04:02 +0200 Subject: [PATCH] seccomp: remove alignment requirements since apparently there are insane programming languages out there that just silently remove packed members in structs. Signed-off-by: Christian Brauner --- src/lxc/lxcseccomp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lxc/lxcseccomp.h b/src/lxc/lxcseccomp.h index 97394dfef1..85bccd2141 100644 --- a/src/lxc/lxcseccomp.h +++ b/src/lxc/lxcseccomp.h @@ -56,7 +56,7 @@ struct seccomp_notify_proxy_msg { struct seccomp_notif_resp resp; pid_t monitor_pid; pid_t init_pid; -} __attribute__((packed, aligned(8))); +}; struct seccomp_notify { bool wants_supervision; From noreply at github.com Wed May 1 03:16:48 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Tue, 30 Apr 2019 20:16:48 -0700 Subject: [lxc-devel] [lxc/lxc] 2a621e: seccomp: remove alignment requirements Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 2a621eceddce438392be2a691abc36116defcb18 https://github.com/lxc/lxc/commit/2a621eceddce438392be2a691abc36116defcb18 Author: Christian Brauner Date: 2019-05-01 (Wed, 01 May 2019) Changed paths: M src/lxc/lxcseccomp.h Log Message: ----------- seccomp: remove alignment requirements since apparently there are insane programming languages out there that just silently remove packed members in structs. Signed-off-by: Christian Brauner Commit: 28805eb0e7f7c850f71a13ec72c8be617896993c https://github.com/lxc/lxc/commit/28805eb0e7f7c850f71a13ec72c8be617896993c Author: Stéphane Graber Date: 2019-04-30 (Tue, 30 Apr 2019) Changed paths: M src/lxc/lxcseccomp.h Log Message: ----------- Merge pull request #2967 from brauner/2019-05-01/seccomp_notifier_api_removal seccomp: remove alignment requirements Compare: https://github.com/lxc/lxc/compare/2bad94767689...28805eb0e7f7 From lxc-bot at linuxcontainers.org Wed May 1 14:47:00 2019 From: lxc-bot at linuxcontainers.org (ajkavanagh on Github) Date: Wed, 01 May 2019 07:47:00 -0700 (PDT) Subject: [lxc-devel] [pylxd/master] Add restore snapshot methods Message-ID: <5cc9b164.1c69fb81.ea71e.32cdSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 478 bytes Desc: not available URL: -------------- next part -------------- Add restore snapshot methods by ajkavanagh · Pull Request #357 · lxc/pylxd · GitHub
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add restore snapshot methods #357

Open
wants to merge 1 commit into
base: master
from

Conversation

1 participant
@ajkavanagh
Copy link
Contributor

commented May 1, 2019

This patchset adds restore methods to the Container and Snapshot classes
to make it possible to restore a snapshot without having to use the
raw_api in pylxd.

Closes: #353

Add restore snapshot methods
This patchset adds restore methods to the Container and Snapshot classes
to make it possible to restore a snapshot without having to use the
raw_api in pylxd.

Closes: #353
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
From lxc-bot at linuxcontainers.org Wed May 1 15:55:24 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Wed, 01 May 2019 08:55:24 -0700 (PDT) Subject: [lxc-devel] [lxc/master] network: Static routes for IPVLAN with L2PROXY Message-ID: <5cc9c16c.1c69fb81.c67cf.5a5bSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 453 bytes Desc: not available URL: -------------- next part -------------- From ef4656b9e8a0c05bcf0ea30a959f28a3506d7125 Mon Sep 17 00:00:00 2001 From: tomponline Date: Tue, 30 Apr 2019 14:25:27 +0100 Subject: [PATCH 1/3] network: Adds layer 2 (ARP/NDP) proxy mode Adds the lxc.net.[i].l2proxy flag that can be either 0 or 1. Defaults to 0. This, when used with lxc.net.[i].link, will add IP neighbour proxy entries on the linked device for any IPv4 and IPv6 addresses on the container's network device. Additionally, for IPv6 addresses it will check the following sysctl values and fail with an error if not set: net.ipv6.conf.[link].proxy_ndp=1 net.ipv6.conf.[link].forwarding=1 Signed-off-by: tomponline --- doc/api-extensions.md | 13 +++ doc/lxc.container.conf.sgml.in | 16 +++ src/lxc/api_extensions.h | 1 + src/lxc/confile.c | 49 ++++++++ src/lxc/confile_utils.c | 4 + src/lxc/file_utils.c | 17 ++- src/lxc/file_utils.h | 2 + src/lxc/network.c | 200 ++++++++++++++++++++++++++++++++- src/lxc/network.h | 1 + 9 files changed, 301 insertions(+), 2 deletions(-) diff --git a/doc/api-extensions.md b/doc/api-extensions.md index 8c95021ada..c301aadd76 100644 --- a/doc/api-extensions.md +++ b/doc/api-extensions.md @@ -51,3 +51,16 @@ The caller can read this message, inspect the syscalls including its arguments. This introduces the `lxc.net.[i].veth.ipv4.route` and `lxc.net.[i].veth.ipv6.route` properties on `veth` type network interfaces. This allows adding static routes on host to the container's network interface. + +## network\_l2proxy + +This introduces the `lxc.net.[i].l2proxy` that can be either `0` or `1`. Defaults to `0`. +This, when used with `lxc.net.[i].link`, will add IP neighbour proxy entries on the linked device +for any IPv4 and IPv6 addresses on the container's network device. + +Additionally, for IPv6 addresses it will check the following sysctl values and fail with an error if not set: + +``` +net.ipv6.conf.[link].proxy_ndp=1 +net.ipv6.conf.[link].forwarding=1 +``` diff --git a/doc/lxc.container.conf.sgml.in b/doc/lxc.container.conf.sgml.in index 3b3dd6ddeb..77157ca78e 100644 --- a/doc/lxc.container.conf.sgml.in +++ b/doc/lxc.container.conf.sgml.in @@ -543,6 +543,22 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + + + + + + + Controls whether layer 2 IP neighbour proxy entries will be added to the + lxc.net.[i].link interface for the IP addresses of the container. + Can be set to 0 or 1. Defaults to 0. + When used with IPv6 addresses, the following sysctl values need to be set: + net.ipv6.conf.[link].proxy_ndp=1 + net.ipv6.conf.[link].forwarding=1 + + + + diff --git a/src/lxc/api_extensions.h b/src/lxc/api_extensions.h index 529f19863e..ce34cd5af1 100644 --- a/src/lxc/api_extensions.h +++ b/src/lxc/api_extensions.h @@ -45,6 +45,7 @@ static char *api_extensions[] = { "seccomp_allow_nesting", "seccomp_notify", "network_veth_routes", + "network_l2proxy", }; static size_t nr_api_extensions = sizeof(api_extensions) / sizeof(*api_extensions); diff --git a/src/lxc/confile.c b/src/lxc/confile.c index ebed11522f..398497bd7f 100644 --- a/src/lxc/confile.c +++ b/src/lxc/confile.c @@ -129,6 +129,7 @@ lxc_config_define(net_ipv4_gateway); lxc_config_define(net_ipv6_address); lxc_config_define(net_ipv6_gateway); lxc_config_define(net_link); +lxc_config_define(net_l2proxy); lxc_config_define(net_macvlan_mode); lxc_config_define(net_mtu); lxc_config_define(net_name); @@ -220,6 +221,7 @@ static struct lxc_config_t config_jump_table[] = { { "lxc.net.ipv6.address", set_config_net_ipv6_address, get_config_net_ipv6_address, clr_config_net_ipv6_address, }, { "lxc.net.ipv6.gateway", set_config_net_ipv6_gateway, get_config_net_ipv6_gateway, clr_config_net_ipv6_gateway, }, { "lxc.net.link", set_config_net_link, get_config_net_link, clr_config_net_link, }, + { "lxc.net.l2proxy", set_config_net_l2proxy, get_config_net_l2proxy, clr_config_net_l2proxy, }, { "lxc.net.macvlan.mode", set_config_net_macvlan_mode, get_config_net_macvlan_mode, clr_config_net_macvlan_mode, }, { "lxc.net.mtu", set_config_net_mtu, get_config_net_mtu, clr_config_net_mtu, }, { "lxc.net.name", set_config_net_name, get_config_net_name, clr_config_net_name, }, @@ -396,6 +398,33 @@ static int set_config_net_link(const char *key, const char *value, return ret; } +static int set_config_net_l2proxy(const char *key, const char *value, + struct lxc_conf *lxc_conf, void *data) +{ + struct lxc_netdev *netdev = data; + unsigned int val = 0; + + if (lxc_config_value_empty(value)) + return clr_config_net_l2proxy(key, lxc_conf, data); + + if (!netdev) + return -1; + + if (lxc_safe_uint(value, &val) < 0) + return minus_one_set_errno(EINVAL); + + switch (val) { + case 0: + netdev->l2proxy = false; + return 0; + case 1: + netdev->l2proxy = true; + return 0; + } + + return minus_one_set_errno(EINVAL); +} + static int set_config_net_name(const char *key, const char *value, struct lxc_conf *lxc_conf, void *data) { @@ -4915,6 +4944,19 @@ static int clr_config_net_link(const char *key, struct lxc_conf *lxc_conf, return 0; } +static int clr_config_net_l2proxy(const char *key, struct lxc_conf *lxc_conf, + void *data) +{ + struct lxc_netdev *netdev = data; + + if (!netdev) + return -1; + + netdev->l2proxy = false; + + return 0; +} + static int clr_config_net_macvlan_mode(const char *key, struct lxc_conf *lxc_conf, void *data) { @@ -5205,6 +5247,13 @@ static int get_config_net_link(const char *key, char *retv, int inlen, return fulllen; } +static int get_config_net_l2proxy(const char *key, char *retv, int inlen, + struct lxc_conf *c, void *data) +{ + struct lxc_netdev *netdev = data; + return lxc_get_conf_bool(c, retv, inlen, netdev->l2proxy); +} + static int get_config_net_name(const char *key, char *retv, int inlen, struct lxc_conf *c, void *data) { diff --git a/src/lxc/confile_utils.c b/src/lxc/confile_utils.c index 67bf0824a2..870c6b7e58 100644 --- a/src/lxc/confile_utils.c +++ b/src/lxc/confile_utils.c @@ -328,6 +328,10 @@ void lxc_log_configured_netdevs(const struct lxc_conf *conf) if (netdev->link[0] != '\0') TRACE("link: %s", netdev->link); + /* l2proxy only used when link is specified */ + if (netdev->link[0] != '\0') + TRACE("l2proxy: %s", netdev->l2proxy ? "true" : "false"); + if (netdev->name[0] != '\0') TRACE("name: %s", netdev->name); diff --git a/src/lxc/file_utils.c b/src/lxc/file_utils.c index 603c0ace66..fa8f934093 100644 --- a/src/lxc/file_utils.c +++ b/src/lxc/file_utils.c @@ -147,7 +147,7 @@ ssize_t lxc_read_nointr_expect(int fd, void *buf, size_t count, const void *expe ssize_t ret; ret = lxc_read_nointr(fd, buf, count); - if (ret <= 0) + if (ret < 0) return ret; if ((size_t)ret != count) @@ -158,6 +158,21 @@ ssize_t lxc_read_nointr_expect(int fd, void *buf, size_t count, const void *expe return -1; } + return 0; +} + +ssize_t lxc_read_file_expect(const char *path, void *buf, size_t count, const void *expected_buf) +{ + int fd; + ssize_t ret; + + fd = open(path, O_RDONLY | O_CLOEXEC); + if (fd < 0) + return -1; + + ret = lxc_read_nointr_expect(fd, buf, count, expected_buf); + close(fd); + return ret; } diff --git a/src/lxc/file_utils.h b/src/lxc/file_utils.h index cc8f69e183..1b8033d69b 100644 --- a/src/lxc/file_utils.h +++ b/src/lxc/file_utils.h @@ -40,6 +40,8 @@ extern ssize_t lxc_send_nointr(int sockfd, void *buf, size_t len, int flags); extern ssize_t lxc_read_nointr(int fd, void *buf, size_t count); extern ssize_t lxc_read_nointr_expect(int fd, void *buf, size_t count, const void *expected_buf); +extern ssize_t lxc_read_file_expect(const char *path, void *buf, size_t count, + const void *expected_buf); extern ssize_t lxc_recv_nointr(int sockfd, void *buf, size_t len, int flags); extern bool file_exists(const char *f); diff --git a/src/lxc/network.c b/src/lxc/network.c index ec7dbccccf..4b8431691a 100644 --- a/src/lxc/network.c +++ b/src/lxc/network.c @@ -1497,6 +1497,25 @@ static int proc_sys_net_write(const char *path, const char *value) return err; } +static int lxc_is_ip_forwarding_enabled(const char *ifname, int family) +{ + int ret; + char path[PATH_MAX]; + char buf[1] = ""; + + if (family != AF_INET && family != AF_INET6) + return minus_one_set_errno(EINVAL); + + ret = snprintf(path, PATH_MAX, "/proc/sys/net/%s/conf/%s/%s", + family == AF_INET ? "ipv4" : "ipv6", ifname, + "forwarding"); + + if (ret < 0 || (size_t)ret >= PATH_MAX) + return minus_one_set_errno(E2BIG); + + return lxc_read_file_expect(path, buf, 1, "1"); +} + static int neigh_proxy_set(const char *ifname, int family, int flag) { int ret; @@ -1514,6 +1533,25 @@ static int neigh_proxy_set(const char *ifname, int family, int flag) return proc_sys_net_write(path, flag ? "1" : "0"); } +static int lxc_is_ip_neigh_proxy_enabled(const char *ifname, int family) +{ + int ret; + char path[PATH_MAX]; + char buf[1] = ""; + + if (family != AF_INET && family != AF_INET6) + return minus_one_set_errno(EINVAL); + + ret = snprintf(path, PATH_MAX, "/proc/sys/net/%s/conf/%s/%s", + family == AF_INET ? "ipv4" : "ipv6", ifname, + family == AF_INET ? "proxy_arp" : "proxy_ndp"); + + if (ret < 0 || (size_t)ret >= PATH_MAX) + return minus_one_set_errno(E2BIG); + + return lxc_read_file_expect(path, buf, 1, "1"); +} + int lxc_neigh_proxy_on(const char *name, int family) { return neigh_proxy_set(name, family, 1); @@ -2515,6 +2553,151 @@ bool lxc_delete_network_unpriv(struct lxc_handler *handler) return true; } +struct ip_proxy_args { + const char *ip; + const char *dev; +}; + +static int lxc_add_ip_proxy_exec_wrapper(void *data) +{ + struct ip_proxy_args *args = data; + + execlp("ip", "ip", "neigh", "add", "proxy", args->ip, "dev", args->dev, + (char *)NULL); + return -1; +} + +static int lxc_del_ip_proxy_exec_wrapper(void *data) +{ + struct ip_proxy_args *args = data; + + execlp("ip", "ip", "neigh", "flush", "proxy", args->ip, "dev", args->dev, + (char *)NULL); + return -1; +} + +static int lxc_add_ip_proxy(const char *ip, const char *dev) +{ + int ret; + char cmd_output[PATH_MAX]; + struct ip_proxy_args args; + args.ip = ip; + args.dev = dev; + + ret = run_command(cmd_output, sizeof(cmd_output), + lxc_add_ip_proxy_exec_wrapper, (void *)&args); + if (ret < 0) { + ERROR("Failed to add ip proxy \"%s\" to dev \"%s\": %s", ip, dev, cmd_output); + return -1; + } + + return 0; +} + +static int lxc_del_ip_proxy(const char *ip, const char *dev) +{ + int ret; + char cmd_output[PATH_MAX]; + struct ip_proxy_args args; + args.ip = ip; + args.dev = dev; + + ret = run_command(cmd_output, sizeof(cmd_output), + lxc_del_ip_proxy_exec_wrapper, (void *)&args); + if (ret < 0) { + ERROR("Failed to delete ip proxy \"%s\" to dev \"%s\": %s", ip, dev, cmd_output); + return -1; + } + + return 0; +} + +static int lxc_setup_l2proxy(struct lxc_netdev *netdev) { + struct lxc_list *cur, *next; + struct lxc_inetdev *inet4dev; + struct lxc_inet6dev *inet6dev; + char bufinet4[INET_ADDRSTRLEN], bufinet6[INET6_ADDRSTRLEN]; + + /* If IPv6 addresses are specified, then check that sysctl is configured correctly. */ + if (!lxc_list_empty(&netdev->ipv6)) { + /* Check for net.ipv6.conf.[link].proxy_ndp=1 */ + if (lxc_is_ip_neigh_proxy_enabled(netdev->link, AF_INET6) < 0) { + ERROR("l2proxy requires sysctl net.ipv6.conf.%s.proxy_ndp be set to 1", netdev->link); + return minus_one_set_errno(EINVAL); + } + + /* Check for net.ipv6.conf.[link].forwarding=1 */ + if (lxc_is_ip_forwarding_enabled(netdev->link, AF_INET6) < 0) { + ERROR("l2proxy requires sysctl net.ipv6.conf.%s.forwarding be set to 1", netdev->link); + return minus_one_set_errno(EINVAL); + } + } + + lxc_list_for_each_safe(cur, &netdev->ipv4, next) { + inet4dev = cur->elem; + if (!inet_ntop(AF_INET, &inet4dev->addr, bufinet4, sizeof(bufinet4))) { + return minus_one_set_errno(EINVAL); + } + + if (lxc_add_ip_proxy(bufinet4, netdev->link)) { + return minus_one_set_errno(EINVAL); + } + } + + lxc_list_for_each_safe(cur, &netdev->ipv6, next) { + inet6dev = cur->elem; + if (!inet_ntop(AF_INET6, &inet6dev->addr, bufinet6, sizeof(bufinet6))) { + return minus_one_set_errno(EINVAL); + } + + if (lxc_add_ip_proxy(bufinet6, netdev->link)) { + return minus_one_set_errno(EINVAL); + } + } + + return 0; +} + +static int lxc_delete_l2proxy(struct lxc_netdev *netdev) { + struct lxc_list *cur, *next; + struct lxc_inetdev *inet4dev; + struct lxc_inet6dev *inet6dev; + char bufinet4[INET_ADDRSTRLEN], bufinet6[INET6_ADDRSTRLEN]; + int err = 0; + + lxc_list_for_each_safe(cur, &netdev->ipv4, next) { + inet4dev = cur->elem; + if (!inet_ntop(AF_INET, &inet4dev->addr, bufinet4,sizeof(bufinet4))) { + err = -1; + continue; /* Try to remove any other l2proxy entries */ + } + + if (lxc_del_ip_proxy(bufinet4, netdev->link)) { + err = -1; + continue; /* Try to remove any other l2proxy entries */ + } + } + + lxc_list_for_each_safe(cur, &netdev->ipv6, next) { + inet6dev = cur->elem; + if (!inet_ntop(AF_INET6, &inet6dev->addr, bufinet6, sizeof(bufinet6))) { + err = -1; + continue; /* Try to remove any other l2proxy entries */ + } + + if (lxc_del_ip_proxy(bufinet6, netdev->link)) { + err = -1; + continue; /* Try to remove any other l2proxy entries */ + } + } + + if (err < 0) { + return minus_one_set_errno(EINVAL); + } + + return 0; +} + int lxc_create_network_priv(struct lxc_handler *handler) { struct lxc_list *iterator; @@ -2531,11 +2714,18 @@ int lxc_create_network_priv(struct lxc_handler *handler) return -1; } + /* Setup l2proxy entries if enabled and used with a link property */ + if (netdev->l2proxy && netdev->link[0] != '\0') { + if (lxc_setup_l2proxy(netdev)) { + ERROR("Failed to setup l2proxy"); + return -1; + } + } + if (netdev_conf[netdev->type](handler, netdev)) { ERROR("Failed to create network device"); return -1; } - } return 0; @@ -2631,6 +2821,14 @@ bool lxc_delete_network_priv(struct lxc_handler *handler) if (!netdev->ifindex) continue; + /* Delete l2proxy entries if enabled and used with a link property */ + if (netdev->l2proxy && netdev->link[0] != '\0') { + if (lxc_delete_l2proxy(netdev)) { + WARN("Failed to delete l2proxy"); + /* Don't return, let the network be cleaned up as normal. */ + } + } + if (netdev->type == LXC_NET_PHYS) { ret = lxc_netdev_rename_by_index(netdev->ifindex, netdev->link); if (ret < 0) diff --git a/src/lxc/network.h b/src/lxc/network.h index e2757c1dba..a7ae82fc7b 100644 --- a/src/lxc/network.h +++ b/src/lxc/network.h @@ -164,6 +164,7 @@ struct lxc_netdev { int type; int flags; char link[IFNAMSIZ]; + bool l2proxy; char name[IFNAMSIZ]; char *hwaddr; char *mtu; From b275a9ad5fffd63f317722da77ac7125ef4e5033 Mon Sep 17 00:00:00 2001 From: tomponline Date: Fri, 26 Apr 2019 11:26:45 +0100 Subject: [PATCH 2/3] network: Adds IPVLAN support Example usage: lxc.net[i].type=ipvlan lxc.net[i].ipvlan.mode=[l3|l3s|l2] (defaults to l3) lxc.net[i].ipvlan.flags=[bridge|private|vepa] (defaults to bridge) lxc.net[i].link=eth0 lxc.net[i].flags=up Signed-off-by: tomponline --- doc/api-extensions.md | 14 +++ doc/lxc.container.conf.sgml.in | 49 +++++++-- src/lxc/api_extensions.h | 1 + src/lxc/confile.c | 175 +++++++++++++++++++++++++++++++-- src/lxc/confile_utils.c | 79 +++++++++++++++ src/lxc/confile_utils.h | 4 + src/lxc/macro.h | 32 ++++++ src/lxc/network.c | 164 ++++++++++++++++++++++++++++++ src/lxc/network.h | 7 ++ src/tests/parse_config_file.c | 35 +++++++ 10 files changed, 545 insertions(+), 15 deletions(-) diff --git a/doc/api-extensions.md b/doc/api-extensions.md index c301aadd76..91ffd0a2d6 100644 --- a/doc/api-extensions.md +++ b/doc/api-extensions.md @@ -64,3 +64,17 @@ Additionally, for IPv6 addresses it will check the following sysctl values and f net.ipv6.conf.[link].proxy_ndp=1 net.ipv6.conf.[link].forwarding=1 ``` + +## network\_ipvlan + +This introduces the `ipvlan` network type. + +Example usage: + +``` +lxc.net[i].type=ipvlan +lxc.net[i].ipvlan.mode=[l3|l3s|l2] (defaults to l3) +lxc.net[i].ipvlan.isolation=[bridge|private|vepa] (defaults to bridge) +lxc.net[i].link=eth0 +lxc.net[i].flags=up +``` diff --git a/doc/lxc.container.conf.sgml.in b/doc/lxc.container.conf.sgml.in index 77157ca78e..2589028c22 100644 --- a/doc/lxc.container.conf.sgml.in +++ b/doc/lxc.container.conf.sgml.in @@ -485,7 +485,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA different macvlan on the same upper device. The accepted modes are , , and . - In mode, the device never + In mode, the device never communicates with any other device on the same upper_dev (default). In mode, the new Virtual Ethernet Port Aggregator (VEPA) mode, it assumes that the adjacent @@ -510,6 +510,41 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA mode is possible for one physical interface. + + an ipvlan interface is linked + with the interface specified by + the and assigned to + the container. + specifies the + mode the ipvlan will use to communicate between + different ipvlan on the same upper device. The accepted + modes are , and + . It defaults to mode. + In mode TX processing up to L3 happens on the stack instance + attached to the slave device and packets are switched to the stack instance of the + master device for the L2 processing and routing from that instance will be + used before packets are queued on the outbound device. In this mode the slaves + will not receive nor can send multicast / broadcast traffic. + In mode TX processing is very similar to the L3 mode except that + iptables (conn-tracking) works in this mode and hence it is L3-symmetric (L3s). + This will have slightly less performance but that shouldn't matter since you are + choosing this mode over plain-L3 mode to make conn-tracking work. + In mode TX processing happens on the stack instance attached to + the slave device and packets are switched and queued to the master device to send + out. In this mode the slaves will RX/TX multicast and broadcast (if applicable) as well. + specifies the isolation mode. + The accepted isolation values are , + and . + It defaults to . + In isolation mode slaves can cross-talk among themselves + apart from talking through the master device. + In isolation mode the port is set in private mode. + i.e. port won't allow cross communication between slaves. + In isolation mode the port is set in VEPA mode. + i.e. port will offload switching functionality to the external entity as + described in 802.1Qbg. + + an already existing interface specified by the is @@ -626,8 +661,8 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA interface (as specified by the option) and use that as the gateway. is only available when - using the and - network types. + using the , + and network types. @@ -660,8 +695,8 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA interface (as specified by the option) and use that as the gateway. is only available when - using the and - network types. + using the , + and network types. @@ -696,7 +731,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA LXC_NET_TYPE: the network type. This is one of the valid - network types listed here (e.g. 'macvlan', 'veth'). + network types listed here (e.g. 'vlan', 'macvlan', 'ipvlan', 'veth'). @@ -762,7 +797,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA LXC_NET_TYPE: the network type. This is one of the valid - network types listed here (e.g. 'macvlan', 'veth'). + network types listed here (e.g. 'vlan', 'macvlan', 'ipvlan', 'veth'). diff --git a/src/lxc/api_extensions.h b/src/lxc/api_extensions.h index ce34cd5af1..40a0d199c8 100644 --- a/src/lxc/api_extensions.h +++ b/src/lxc/api_extensions.h @@ -46,6 +46,7 @@ static char *api_extensions[] = { "seccomp_notify", "network_veth_routes", "network_l2proxy", + "network_ipvlan", }; static size_t nr_api_extensions = sizeof(api_extensions) / sizeof(*api_extensions); diff --git a/src/lxc/confile.c b/src/lxc/confile.c index 398497bd7f..b245213e8d 100644 --- a/src/lxc/confile.c +++ b/src/lxc/confile.c @@ -131,6 +131,8 @@ lxc_config_define(net_ipv6_gateway); lxc_config_define(net_link); lxc_config_define(net_l2proxy); lxc_config_define(net_macvlan_mode); +lxc_config_define(net_ipvlan_mode); +lxc_config_define(net_ipvlan_isolation); lxc_config_define(net_mtu); lxc_config_define(net_name); lxc_config_define(net_nic); @@ -223,6 +225,8 @@ static struct lxc_config_t config_jump_table[] = { { "lxc.net.link", set_config_net_link, get_config_net_link, clr_config_net_link, }, { "lxc.net.l2proxy", set_config_net_l2proxy, get_config_net_l2proxy, clr_config_net_l2proxy, }, { "lxc.net.macvlan.mode", set_config_net_macvlan_mode, get_config_net_macvlan_mode, clr_config_net_macvlan_mode, }, + { "lxc.net.ipvlan.mode", set_config_net_ipvlan_mode, get_config_net_ipvlan_mode, clr_config_net_ipvlan_mode, }, + { "lxc.net.ipvlan.isolation", set_config_net_ipvlan_isolation, get_config_net_ipvlan_isolation, clr_config_net_ipvlan_isolation, }, { "lxc.net.mtu", set_config_net_mtu, get_config_net_mtu, clr_config_net_mtu, }, { "lxc.net.name", set_config_net_name, get_config_net_name, clr_config_net_name, }, { "lxc.net.script.down", set_config_net_script_down, get_config_net_script_down, clr_config_net_script_down, }, @@ -293,21 +297,24 @@ static int set_config_net_type(const char *key, const char *value, if (!netdev) return -1; - if (!strcmp(value, "veth")) { + if (strcmp(value, "veth") == 0) { netdev->type = LXC_NET_VETH; lxc_list_init(&netdev->priv.veth_attr.ipv4_routes); lxc_list_init(&netdev->priv.veth_attr.ipv6_routes); - } else if (!strcmp(value, "macvlan")) { + } else if (strcmp(value, "macvlan") == 0) { netdev->type = LXC_NET_MACVLAN; - lxc_macvlan_mode_to_flag(&netdev->priv.macvlan_attr.mode, - "private"); - } else if (!strcmp(value, "vlan")) { + lxc_macvlan_mode_to_flag(&netdev->priv.macvlan_attr.mode, "private"); + } else if (strcmp(value, "ipvlan") == 0) { + netdev->type = LXC_NET_IPVLAN; + lxc_ipvlan_mode_to_flag(&netdev->priv.ipvlan_attr.mode, "l3"); + lxc_ipvlan_isolation_to_flag(&netdev->priv.ipvlan_attr.isolation, "bridge"); + } else if (strcmp(value, "vlan") == 0) { netdev->type = LXC_NET_VLAN; - } else if (!strcmp(value, "phys")) { + } else if (strcmp(value, "phys") == 0) { netdev->type = LXC_NET_PHYS; - } else if (!strcmp(value, "empty")) { + } else if (strcmp(value, "empty") == 0) { netdev->type = LXC_NET_EMPTY; - } else if (!strcmp(value, "none")) { + } else if (strcmp(value, "none") == 0) { netdev->type = LXC_NET_NONE; } else { ERROR("Invalid network type %s", value); @@ -467,6 +474,44 @@ static int set_config_net_macvlan_mode(const char *key, const char *value, return lxc_macvlan_mode_to_flag(&netdev->priv.macvlan_attr.mode, value); } +static int set_config_net_ipvlan_mode(const char *key, const char *value, + struct lxc_conf *lxc_conf, void *data) +{ + struct lxc_netdev *netdev = data; + + if (lxc_config_value_empty(value)) + return clr_config_net_ipvlan_mode(key, lxc_conf, data); + + if (!netdev) + return minus_one_set_errno(EINVAL); + + if (netdev->type != LXC_NET_IPVLAN) { + SYSERROR("Invalid ipvlan mode \"%s\", can only be used with ipvlan network", value); + return minus_one_set_errno(EINVAL); + } + + return lxc_ipvlan_mode_to_flag(&netdev->priv.ipvlan_attr.mode, value); +} + +static int set_config_net_ipvlan_isolation(const char *key, const char *value, + struct lxc_conf *lxc_conf, void *data) +{ + struct lxc_netdev *netdev = data; + + if (lxc_config_value_empty(value)) + return clr_config_net_ipvlan_isolation(key, lxc_conf, data); + + if (!netdev) + return minus_one_set_errno(EINVAL); + + if (netdev->type != LXC_NET_IPVLAN) { + SYSERROR("Invalid ipvlan isolation \"%s\", can only be used with ipvlan network", value); + return minus_one_set_errno(EINVAL); + } + + return lxc_ipvlan_isolation_to_flag(&netdev->priv.ipvlan_attr.isolation, value); +} + static int set_config_net_hwaddr(const char *key, const char *value, struct lxc_conf *lxc_conf, void *data) { @@ -4973,6 +5018,38 @@ static int clr_config_net_macvlan_mode(const char *key, return 0; } +static int clr_config_net_ipvlan_mode(const char *key, + struct lxc_conf *lxc_conf, void *data) +{ + struct lxc_netdev *netdev = data; + + if (!netdev) + return minus_one_set_errno(EINVAL); + + if (netdev->type != LXC_NET_IPVLAN) + return 0; + + netdev->priv.ipvlan_attr.mode = -1; + + return 0; +} + +static int clr_config_net_ipvlan_isolation(const char *key, + struct lxc_conf *lxc_conf, void *data) +{ + struct lxc_netdev *netdev = data; + + if (!netdev) + return minus_one_set_errno(EINVAL); + + if (netdev->type != LXC_NET_IPVLAN) + return 0; + + netdev->priv.ipvlan_attr.isolation = -1; + + return 0; +} + static int clr_config_net_veth_pair(const char *key, struct lxc_conf *lxc_conf, void *data) { @@ -5317,6 +5394,84 @@ static int get_config_net_macvlan_mode(const char *key, char *retv, int inlen, return fulllen; } +static int get_config_net_ipvlan_mode(const char *key, char *retv, int inlen, + struct lxc_conf *c, void *data) +{ + int len; + int fulllen = 0; + const char *mode; + struct lxc_netdev *netdev = data; + + if (!retv) + inlen = 0; + else + memset(retv, 0, inlen); + + if (!netdev) + return minus_one_set_errno(EINVAL); + + if (netdev->type != LXC_NET_IPVLAN) + return 0; + + switch (netdev->priv.ipvlan_attr.mode) { + case IPVLAN_MODE_L3: + mode = "l3"; + break; + case IPVLAN_MODE_L3S: + mode = "l3s"; + break; + case IPVLAN_MODE_L2: + mode = "l2"; + break; + default: + mode = "(invalid)"; + break; + } + + strprint(retv, inlen, "%s", mode); + + return fulllen; +} + +static int get_config_net_ipvlan_isolation(const char *key, char *retv, int inlen, + struct lxc_conf *c, void *data) +{ + int len; + int fulllen = 0; + const char *mode; + struct lxc_netdev *netdev = data; + + if (!retv) + inlen = 0; + else + memset(retv, 0, inlen); + + if (!netdev) + return minus_one_set_errno(EINVAL); + + if (netdev->type != LXC_NET_IPVLAN) + return 0; + + switch (netdev->priv.ipvlan_attr.isolation) { + case IPVLAN_ISOLATION_BRIDGE: + mode = "bridge"; + break; + case IPVLAN_ISOLATION_PRIVATE: + mode = "private"; + break; + case IPVLAN_ISOLATION_VEPA: + mode = "vepa"; + break; + default: + mode = "(invalid)"; + break; + } + + strprint(retv, inlen, "%s", mode); + + return fulllen; +} + static int get_config_net_veth_pair(const char *key, char *retv, int inlen, struct lxc_conf *c, void *data) { @@ -5767,6 +5922,10 @@ int lxc_list_net(struct lxc_conf *c, const char *key, char *retv, int inlen) case LXC_NET_MACVLAN: strprint(retv, inlen, "macvlan.mode\n"); break; + case LXC_NET_IPVLAN: + strprint(retv, inlen, "ipvlan.mode\n"); + strprint(retv, inlen, "ipvlan.isolation\n"); + break; case LXC_NET_VLAN: strprint(retv, inlen, "vlan.id\n"); break; diff --git a/src/lxc/confile_utils.c b/src/lxc/confile_utils.c index 870c6b7e58..12a8dbb095 100644 --- a/src/lxc/confile_utils.c +++ b/src/lxc/confile_utils.c @@ -299,6 +299,17 @@ void lxc_log_configured_netdevs(const struct lxc_conf *conf) mode ? mode : "(invalid mode)"); } break; + case LXC_NET_IPVLAN: + TRACE("type: ipvlan"); + + char *mode; + mode = lxc_ipvlan_flag_to_mode(netdev->priv.ipvlan_attr.mode); + TRACE("ipvlan mode: %s", mode ? mode : "(invalid mode)"); + + char *isolation; + isolation = lxc_ipvlan_flag_to_isolation(netdev->priv.ipvlan_attr.isolation); + TRACE("ipvlan isolation: %s", isolation ? isolation : "(invalid isolation)"); + break; case LXC_NET_VLAN: TRACE("type: vlan"); TRACE("vlan id: %d", netdev->priv.vlan_attr.vid); @@ -523,6 +534,74 @@ char *lxc_macvlan_flag_to_mode(int mode) return NULL; } +static struct lxc_ipvlan_mode { + char *name; + int mode; +} ipvlan_mode[] = { + { "l3", IPVLAN_MODE_L3 }, + { "l3s", IPVLAN_MODE_L3S }, + { "l2", IPVLAN_MODE_L2 }, +}; + +int lxc_ipvlan_mode_to_flag(int *mode, const char *value) +{ + for (size_t i = 0; i < sizeof(ipvlan_mode) / sizeof(ipvlan_mode[0]); i++) { + if (strcmp(ipvlan_mode[i].name, value) != 0) + continue; + + *mode = ipvlan_mode[i].mode; + return 0; + } + + return -1; +} + +char *lxc_ipvlan_flag_to_mode(int mode) +{ + for (size_t i = 0; i < sizeof(ipvlan_mode) / sizeof(ipvlan_mode[0]); i++) { + if (ipvlan_mode[i].mode != mode) + continue; + + return ipvlan_mode[i].name; + } + + return NULL; +} + +static struct lxc_ipvlan_isolation { + char *name; + int flag; +} ipvlan_isolation[] = { + { "bridge", IPVLAN_ISOLATION_BRIDGE }, + { "private", IPVLAN_ISOLATION_PRIVATE }, + { "vepa", IPVLAN_ISOLATION_VEPA }, +}; + +int lxc_ipvlan_isolation_to_flag(int *flag, const char *value) +{ + for (size_t i = 0; i < sizeof(ipvlan_isolation) / sizeof(ipvlan_isolation[0]); i++) { + if (strcmp(ipvlan_isolation[i].name, value) != 0) + continue; + + *flag = ipvlan_isolation[i].flag; + return 0; + } + + return -1; +} + +char *lxc_ipvlan_flag_to_isolation(int flag) +{ + for (size_t i = 0; i < sizeof(ipvlan_isolation) / sizeof(ipvlan_isolation[0]); i++) { + if (ipvlan_isolation[i].flag != flag) + continue; + + return ipvlan_isolation[i].name; + } + + return NULL; +} + int set_config_string_item(char **conf_item, const char *value) { char *new_value; diff --git a/src/lxc/confile_utils.h b/src/lxc/confile_utils.h index 5a3bcc914c..cfed91dc09 100644 --- a/src/lxc/confile_utils.h +++ b/src/lxc/confile_utils.h @@ -58,6 +58,10 @@ extern bool lxc_remove_nic_by_idx(struct lxc_conf *conf, unsigned int idx); extern void lxc_free_networks(struct lxc_list *networks); extern int lxc_macvlan_mode_to_flag(int *mode, const char *value); extern char *lxc_macvlan_flag_to_mode(int mode); +extern int lxc_ipvlan_mode_to_flag(int *mode, const char *value); +extern char *lxc_ipvlan_flag_to_mode(int mode); +extern int lxc_ipvlan_isolation_to_flag(int *mode, const char *value); +extern char *lxc_ipvlan_flag_to_isolation(int mode); extern int set_config_string_item(char **conf_item, const char *value); extern int set_config_string_item_max(char **conf_item, const char *value, diff --git a/src/lxc/macro.h b/src/lxc/macro.h index 7df3b56f03..7626c5d76b 100644 --- a/src/lxc/macro.h +++ b/src/lxc/macro.h @@ -280,6 +280,14 @@ extern int __build_bug_on_failed; #define IFLA_MACVLAN_MODE 1 #endif +#ifndef IFLA_IPVLAN_MODE +#define IFLA_IPVLAN_MODE 1 +#endif + +#ifndef IFLA_IPVLAN_ISOLATION +#define IFLA_IPVLAN_ISOLATION 2 +#endif + #ifndef IFLA_NEW_NETNSID #define IFLA_NEW_NETNSID 45 #endif @@ -333,6 +341,30 @@ extern int __build_bug_on_failed; #define MACVLAN_MODE_PASSTHRU 8 #endif +#ifndef IPVLAN_MODE_L2 +#define IPVLAN_MODE_L2 0 +#endif + +#ifndef IPVLAN_MODE_L3 +#define IPVLAN_MODE_L3 1 +#endif + +#ifndef IPVLAN_MODE_L3S +#define IPVLAN_MODE_L3S 2 +#endif + +#ifndef IPVLAN_ISOLATION_BRIDGE +#define IPVLAN_ISOLATION_BRIDGE 0 +#endif + +#ifndef IPVLAN_ISOLATION_PRIVATE +#define IPVLAN_ISOLATION_PRIVATE 1 +#endif + +#ifndef IPVLAN_ISOLATION_VEPA +#define IPVLAN_ISOLATION_VEPA 2 +#endif + /* Attributes of RTM_NEWNSID/RTM_GETNSID messages */ enum { __LXC_NETNSA_NONE, diff --git a/src/lxc/network.c b/src/lxc/network.c index 4b8431691a..d8d826b6f7 100644 --- a/src/lxc/network.c +++ b/src/lxc/network.c @@ -376,6 +376,147 @@ static int instantiate_macvlan(struct lxc_handler *handler, struct lxc_netdev *n return -1; } +static int lxc_ipvlan_create(const char *master, const char *name, int mode, int isolation) +{ + int err, index, len; + struct ifinfomsg *ifi; + struct nl_handler nlh; + struct rtattr *nest, *nest2; + struct nlmsg *answer = NULL, *nlmsg = NULL; + + len = strlen(master); + if (len == 1 || len >= IFNAMSIZ) + return minus_one_set_errno(EINVAL); + + len = strlen(name); + if (len == 1 || len >= IFNAMSIZ) + return minus_one_set_errno(EINVAL); + + index = if_nametoindex(master); + if (!index) + return minus_one_set_errno(EINVAL); + + err = netlink_open(&nlh, NETLINK_ROUTE); + if (err) + return minus_one_set_errno(-err); + + err = -ENOMEM; + nlmsg = nlmsg_alloc(NLMSG_GOOD_SIZE); + if (!nlmsg) + goto out; + + answer = nlmsg_alloc_reserve(NLMSG_GOOD_SIZE); + if (!answer) + goto out; + + nlmsg->nlmsghdr->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL | NLM_F_ACK; + nlmsg->nlmsghdr->nlmsg_type = RTM_NEWLINK; + + ifi = nlmsg_reserve(nlmsg, sizeof(struct ifinfomsg)); + if (!ifi) { + goto out; + } + ifi->ifi_family = AF_UNSPEC; + + err = -EPROTO; + nest = nla_begin_nested(nlmsg, IFLA_LINKINFO); + if (!nest) + goto out; + + if (nla_put_string(nlmsg, IFLA_INFO_KIND, "ipvlan")) + goto out; + + if (mode) { + nest2 = nla_begin_nested(nlmsg, IFLA_INFO_DATA); + if (!nest2) + goto out; + + if (nla_put_u32(nlmsg, IFLA_IPVLAN_MODE, mode)) + goto out; + + /* if_link.h does not define the isolation flag value for bridge mode so we define it as 0 + * and only send mode if mode >0 as default mode is bridge anyway according to ipvlan docs. + */ + if (isolation > 0) { + if (nla_put_u16(nlmsg, IFLA_IPVLAN_ISOLATION, isolation)) + goto out; + } + + nla_end_nested(nlmsg, nest2); + } + + nla_end_nested(nlmsg, nest); + + if (nla_put_u32(nlmsg, IFLA_LINK, index)) + goto out; + + if (nla_put_string(nlmsg, IFLA_IFNAME, name)) + goto out; + + err = netlink_transaction(&nlh, nlmsg, answer); +out: + netlink_close(&nlh); + nlmsg_free(answer); + nlmsg_free(nlmsg); + if (err < 0) + return minus_one_set_errno(-err); + return 0; +} + +static int instantiate_ipvlan(struct lxc_handler *handler, struct lxc_netdev *netdev) +{ + char peerbuf[IFNAMSIZ], *peer; + int err; + + if (netdev->link[0] == '\0') { + ERROR("No link for ipvlan network device specified"); + return -1; + } + + err = snprintf(peerbuf, sizeof(peerbuf), "ipXXXXXX"); + if (err < 0 || (size_t)err >= sizeof(peerbuf)) + return -1; + + peer = lxc_mkifname(peerbuf); + if (!peer) + return -1; + + err = lxc_ipvlan_create(netdev->link, peer, netdev->priv.ipvlan_attr.mode, netdev->priv.ipvlan_attr.isolation); + if (err) { + SYSERROR("Failed to create ipvlan interface \"%s\" on \"%s\"", peer, netdev->link); + goto on_error; + } + + netdev->ifindex = if_nametoindex(peer); + if (!netdev->ifindex) { + ERROR("Failed to retrieve ifindex for \"%s\"", peer); + goto on_error; + } + + if (netdev->upscript) { + char *argv[] = { + "ipvlan", + netdev->link, + NULL, + }; + + err = run_script_argv(handler->name, + handler->conf->hooks_version, "net", + netdev->upscript, "up", argv); + if (err < 0) + goto on_error; + } + + DEBUG("Instantiated ipvlan \"%s\" with ifindex is %d and mode %d", + peer, netdev->ifindex, netdev->priv.macvlan_attr.mode); + + return 0; + +on_error: + lxc_netdev_delete_by_name(peer); + return -1; +} + static int instantiate_vlan(struct lxc_handler *handler, struct lxc_netdev *netdev) { char peer[IFNAMSIZ]; @@ -518,6 +659,7 @@ static int instantiate_none(struct lxc_handler *handler, struct lxc_netdev *netd static instantiate_cb netdev_conf[LXC_NET_MAXCONFTYPE + 1] = { [LXC_NET_VETH] = instantiate_veth, [LXC_NET_MACVLAN] = instantiate_macvlan, + [LXC_NET_IPVLAN] = instantiate_ipvlan, [LXC_NET_VLAN] = instantiate_vlan, [LXC_NET_PHYS] = instantiate_phys, [LXC_NET_EMPTY] = instantiate_empty, @@ -571,6 +713,26 @@ static int shutdown_macvlan(struct lxc_handler *handler, struct lxc_netdev *netd return 0; } +static int shutdown_ipvlan(struct lxc_handler *handler, struct lxc_netdev *netdev) +{ + int ret; + char *argv[] = { + "ipvlan", + netdev->link, + NULL, + }; + + if (!netdev->downscript) + return 0; + + ret = run_script_argv(handler->name, handler->conf->hooks_version, + "net", netdev->downscript, "down", argv); + if (ret < 0) + return -1; + + return 0; +} + static int shutdown_vlan(struct lxc_handler *handler, struct lxc_netdev *netdev) { int ret; @@ -638,6 +800,7 @@ static int shutdown_none(struct lxc_handler *handler, struct lxc_netdev *netdev) static instantiate_cb netdev_deconf[LXC_NET_MAXCONFTYPE + 1] = { [LXC_NET_VETH] = shutdown_veth, [LXC_NET_MACVLAN] = shutdown_macvlan, + [LXC_NET_IPVLAN] = shutdown_ipvlan, [LXC_NET_VLAN] = shutdown_vlan, [LXC_NET_PHYS] = shutdown_phys, [LXC_NET_EMPTY] = shutdown_empty, @@ -2050,6 +2213,7 @@ static const char *const lxc_network_types[LXC_NET_MAXCONFTYPE + 1] = { [LXC_NET_EMPTY] = "empty", [LXC_NET_VETH] = "veth", [LXC_NET_MACVLAN] = "macvlan", + [LXC_NET_IPVLAN] = "ipvlan", [LXC_NET_PHYS] = "phys", [LXC_NET_VLAN] = "vlan", [LXC_NET_NONE] = "none", diff --git a/src/lxc/network.h b/src/lxc/network.h index a7ae82fc7b..468593f5e3 100644 --- a/src/lxc/network.h +++ b/src/lxc/network.h @@ -40,6 +40,7 @@ enum { LXC_NET_EMPTY, LXC_NET_VETH, LXC_NET_MACVLAN, + LXC_NET_IPVLAN, LXC_NET_PHYS, LXC_NET_VLAN, LXC_NET_NONE, @@ -110,6 +111,11 @@ struct ifla_macvlan { int mode; /* private, vepa, bridge, passthru */ }; +struct ifla_ipvlan { + int mode; /* l3, l3s, l2 */ + int isolation; /* bridge, private, vepa */ +}; + /* Contains information about the physical network device as seen from the host. * @ifindex : The ifindex of the physical network device in the host's network * namespace. @@ -120,6 +126,7 @@ struct ifla_phys { union netdev_p { struct ifla_macvlan macvlan_attr; + struct ifla_ipvlan ipvlan_attr; struct ifla_phys phys_attr; struct ifla_veth veth_attr; struct ifla_vlan vlan_attr; diff --git a/src/tests/parse_config_file.c b/src/tests/parse_config_file.c index f4b4e9a287..ad17867b43 100644 --- a/src/tests/parse_config_file.c +++ b/src/tests/parse_config_file.c @@ -666,6 +666,11 @@ int main(int argc, char *argv[]) goto non_test_error; } + if (set_get_compare_clear_save_load(c, "lxc.net.0.type", "ipvlan", tmpf, true)) { + lxc_error("%s\n", "lxc.net.0.type"); + goto non_test_error; + } + if (set_get_compare_clear_save_load(c, "lxc.net.1000.type", "phys", tmpf, true)) { lxc_error("%s\n", "lxc.net.1000.type"); goto non_test_error; @@ -701,6 +706,36 @@ int main(int argc, char *argv[]) goto non_test_error; } + if (set_get_compare_clear_save_load_network(c, "lxc.net.0.ipvlan.mode", "l3", tmpf, true, "ipvlan")) { + lxc_error("%s\n", "lxc.net.0.ipvlan.mode"); + goto non_test_error; + } + + if (set_get_compare_clear_save_load_network(c, "lxc.net.0.ipvlan.mode", "l3s", tmpf, true, "ipvlan")) { + lxc_error("%s\n", "lxc.net.0.ipvlan.mode"); + goto non_test_error; + } + + if (set_get_compare_clear_save_load_network(c, "lxc.net.0.ipvlan.mode", "l2", tmpf, true, "ipvlan")) { + lxc_error("%s\n", "lxc.net.0.ipvlan.mode"); + goto non_test_error; + } + + if (set_get_compare_clear_save_load_network(c, "lxc.net.0.ipvlan.isolation", "bridge", tmpf, true, "ipvlan")) { + lxc_error("%s\n", "lxc.net.0.ipvlan.isolation"); + goto non_test_error; + } + + if (set_get_compare_clear_save_load_network(c, "lxc.net.0.ipvlan.isolation", "private", tmpf, true, "ipvlan")) { + lxc_error("%s\n", "lxc.net.0.ipvlan.isolation"); + goto non_test_error; + } + + if (set_get_compare_clear_save_load_network(c, "lxc.net.0.ipvlan.isolation", "vepa", tmpf, true, "ipvlan")) { + lxc_error("%s\n", "lxc.net.0.ipvlan.isolation"); + goto non_test_error; + } + if (set_get_compare_clear_save_load_network(c, "lxc.net.0.veth.pair", "clusterfuck", tmpf, true, "veth")) { lxc_error("%s\n", "lxc.net.0.veth.pair"); goto non_test_error; From 98895fe49661525a8cd622b47102aa50a423c138 Mon Sep 17 00:00:00 2001 From: tomponline Date: Wed, 1 May 2019 16:17:33 +0100 Subject: [PATCH 3/3] network: Adds ipvlan static routes for l2proxy mode Signed-off-by: tomponline --- src/lxc/network.c | 39 +++++++++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/src/lxc/network.c b/src/lxc/network.c index d8d826b6f7..62553c2911 100644 --- a/src/lxc/network.c +++ b/src/lxc/network.c @@ -2781,18 +2781,35 @@ static int lxc_setup_l2proxy(struct lxc_netdev *netdev) { struct lxc_inetdev *inet4dev; struct lxc_inet6dev *inet6dev; char bufinet4[INET_ADDRSTRLEN], bufinet6[INET6_ADDRSTRLEN]; + int lo_ifindex, err; /* If IPv6 addresses are specified, then check that sysctl is configured correctly. */ if (!lxc_list_empty(&netdev->ipv6)) { /* Check for net.ipv6.conf.[link].proxy_ndp=1 */ if (lxc_is_ip_neigh_proxy_enabled(netdev->link, AF_INET6) < 0) { - ERROR("l2proxy requires sysctl net.ipv6.conf.%s.proxy_ndp be set to 1", netdev->link); + ERROR("Requires sysctl net.ipv6.conf.%s.proxy_ndp be set to 1", netdev->link); return minus_one_set_errno(EINVAL); } /* Check for net.ipv6.conf.[link].forwarding=1 */ if (lxc_is_ip_forwarding_enabled(netdev->link, AF_INET6) < 0) { - ERROR("l2proxy requires sysctl net.ipv6.conf.%s.forwarding be set to 1", netdev->link); + ERROR("Requires sysctl net.ipv6.conf.%s.forwarding be set to 1", netdev->link); + return minus_one_set_errno(EINVAL); + } + } + + /* Perform IPVLAN specific checks. */ + if (netdev->type == LXC_NET_IPVLAN) { + /* Check mode is l3s as other modes do not work with l2proxy. */ + if (netdev->priv.ipvlan_attr.mode != IPVLAN_MODE_L3S) { + ERROR("Requires ipvlan mode on dev \"%s\" be l3s when used with l2proxy", netdev->link); + return minus_one_set_errno(EINVAL); + } + + /* Retrieve local-loopback interface index for use with IPVLAN static routes. */ + lo_ifindex = if_nametoindex("lo"); + if (!lo_ifindex) { + ERROR("Failed to retrieve ifindex for \"lo\""); return minus_one_set_errno(EINVAL); } } @@ -2806,6 +2823,15 @@ static int lxc_setup_l2proxy(struct lxc_netdev *netdev) { if (lxc_add_ip_proxy(bufinet4, netdev->link)) { return minus_one_set_errno(EINVAL); } + + /* IPVLAN requires a route to local-loopback to trigger l2proxy. */ + if (netdev->type == LXC_NET_IPVLAN) { + err = lxc_ipv4_dest_add(lo_ifindex, &inet4dev->addr, 32); + if (err) { + ERROR("Failed to add ipv4 dest for network device \"lo\""); + return minus_one_set_errno(-err); + } + } } lxc_list_for_each_safe(cur, &netdev->ipv6, next) { @@ -2817,6 +2843,15 @@ static int lxc_setup_l2proxy(struct lxc_netdev *netdev) { if (lxc_add_ip_proxy(bufinet6, netdev->link)) { return minus_one_set_errno(EINVAL); } + + /* IPVLAN requires a route to local-loopback to trigger l2proxy. */ + if (netdev->type == LXC_NET_IPVLAN) { + err = lxc_ipv6_dest_add(lo_ifindex, &inet6dev->addr, 128); + if (err) { + ERROR("Failed to add ipv6 dest for network device \"lo\""); + return minus_one_set_errno(-err); + } + } } return 0; From noreply at github.com Wed May 1 16:24:58 2019 From: noreply at github.com (Christian Brauner) Date: Wed, 01 May 2019 16:24:58 +0000 (UTC) Subject: [lxc-devel] [lxc/lxc] c9f523: network: Adds IPVLAN support Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: c9f52382915b6b11086bacfeef53f2a6be7df05e https://github.com/lxc/lxc/commit/c9f52382915b6b11086bacfeef53f2a6be7df05e Author: tomponline Date: 2019-05-01 (Wed, 01 May 2019) Changed paths: M doc/api-extensions.md M doc/lxc.container.conf.sgml.in M src/lxc/api_extensions.h M src/lxc/confile.c M src/lxc/confile_utils.c M src/lxc/confile_utils.h M src/lxc/macro.h M src/lxc/network.c M src/lxc/network.h M src/tests/parse_config_file.c Log Message: ----------- network: Adds IPVLAN support Example usage: lxc.net[i].type=ipvlan lxc.net[i].ipvlan.mode=[l3|l3s|l2] (defaults to l3) lxc.net[i].ipvlan.flags=[bridge|private|vepa] (defaults to bridge) lxc.net[i].link=eth0 lxc.net[i].flags=up Signed-off-by: tomponline Commit: ea84ddf9e2e4933bf8110366be3c3f6dd3c4b6a6 https://github.com/lxc/lxc/commit/ea84ddf9e2e4933bf8110366be3c3f6dd3c4b6a6 Author: Christian Brauner Date: 2019-05-01 (Wed, 01 May 2019) Changed paths: M doc/api-extensions.md M doc/lxc.container.conf.sgml.in M src/lxc/api_extensions.h M src/lxc/confile.c M src/lxc/confile_utils.c M src/lxc/confile_utils.h M src/lxc/macro.h M src/lxc/network.c M src/lxc/network.h M src/tests/parse_config_file.c Log Message: ----------- Merge pull request #2950 from tomponline/tp-ipvlan network: Adds IPVLAN support Compare: https://github.com/lxc/lxc/compare/28805eb0e7f7...ea84ddf9e2e4 From lxc-bot at linuxcontainers.org Wed May 1 16:36:34 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Wed, 01 May 2019 09:36:34 -0700 (PDT) Subject: [lxc-devel] [lxc/master] seccomp: ensure fields are set to 0 Message-ID: <5cc9cb12.1c69fb81.c82a4.b308SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 364 bytes Desc: not available URL: -------------- next part -------------- From 370460664ff4cac9976524c386000edd15fdcc52 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 1 May 2019 18:35:58 +0200 Subject: [PATCH] seccomp: ensure fields are set to 0 Signed-off-by: Christian Brauner --- src/lxc/seccomp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lxc/seccomp.c b/src/lxc/seccomp.c index 34abda16a9..f326c8d307 100644 --- a/src/lxc/seccomp.c +++ b/src/lxc/seccomp.c @@ -1342,7 +1342,7 @@ int seccomp_notify_handler(int fd, uint32_t events, void *data, struct seccomp_notif *req = conf->seccomp.notifier.req_buf; struct seccomp_notif_resp *resp = conf->seccomp.notifier.rsp_buf; int listener_proxy_fd = conf->seccomp.notifier.proxy_fd; - struct seccomp_notify_proxy_msg msg; + struct seccomp_notify_proxy_msg msg = {0}; if (listener_proxy_fd < 0) { ERROR("No seccomp proxy registered"); From noreply at github.com Wed May 1 16:44:50 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Wed, 01 May 2019 16:44:50 +0000 (UTC) Subject: [lxc-devel] [lxc/lxc] 370460: seccomp: ensure fields are set to 0 Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 370460664ff4cac9976524c386000edd15fdcc52 https://github.com/lxc/lxc/commit/370460664ff4cac9976524c386000edd15fdcc52 Author: Christian Brauner Date: 2019-05-01 (Wed, 01 May 2019) Changed paths: M src/lxc/seccomp.c Log Message: ----------- seccomp: ensure fields are set to 0 Signed-off-by: Christian Brauner Commit: 0b5afd323e47c4a6eb10b8c7402f532e12e1a233 https://github.com/lxc/lxc/commit/0b5afd323e47c4a6eb10b8c7402f532e12e1a233 Author: Stéphane Graber Date: 2019-05-01 (Wed, 01 May 2019) Changed paths: M src/lxc/seccomp.c Log Message: ----------- Merge pull request #2969 from brauner/2019-05-01/seccomp_fixes seccomp: ensure fields are set to 0 Compare: https://github.com/lxc/lxc/compare/ea84ddf9e2e4...0b5afd323e47 From lxc-bot at linuxcontainers.org Thu May 2 00:47:35 2019 From: lxc-bot at linuxcontainers.org (hallyn on Github) Date: Wed, 01 May 2019 17:47:35 -0700 (PDT) Subject: [lxc-devel] [lxc/master] [wip] namespaces: allow a pathname to a nsfd for namespace to share Message-ID: <5cca3e27.1c69fb81.621e1.f3bfSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 529 bytes Desc: not available URL: -------------- next part -------------- From 5f0b256f547568f063873af84bdd9c18cacca8c4 Mon Sep 17 00:00:00 2001 From: Serge Hallyn Date: Wed, 1 May 2019 17:27:53 -0700 Subject: [PATCH] namespaces: allow a pathname to a nsfd for namespace to share Signed-off-by: Serge Hallyn --- src/lxc/utils.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/src/lxc/utils.c b/src/lxc/utils.c index ea081c566c..975e5791c3 100644 --- a/src/lxc/utils.c +++ b/src/lxc/utils.c @@ -1367,20 +1367,26 @@ int lxc_preserve_ns(const int pid, const char *ns) /* 5 /proc + 21 /int_as_str + 3 /ns + 20 /NS_NAME + 1 \0 */ #define __NS_PATH_LEN 50 char path[__NS_PATH_LEN]; + const char *p; /* This way we can use this function to also check whether namespaces * are supported by the kernel by passing in the NULL or the empty * string. */ - ret = snprintf(path, __NS_PATH_LEN, "/proc/%d/ns%s%s", pid, - !ns || strcmp(ns, "") == 0 ? "" : "/", - !ns || strcmp(ns, "") == 0 ? "" : ns); - if (ret < 0 || (size_t)ret >= __NS_PATH_LEN) { - errno = EFBIG; - return -1; + if (ns[0] == '/') { + p = ns; + } else { + ret = snprintf(path, __NS_PATH_LEN, "/proc/%d/ns%s%s", pid, + !ns || strcmp(ns, "") == 0 ? "" : "/", + !ns || strcmp(ns, "") == 0 ? "" : ns); + if (ret < 0 || (size_t)ret >= __NS_PATH_LEN) { + errno = EFBIG; + return -1; + } + p = path; } - return open(path, O_RDONLY | O_CLOEXEC); + return open(p, O_RDONLY | O_CLOEXEC); } bool lxc_switch_uid_gid(uid_t uid, gid_t gid) From lxc-bot at linuxcontainers.org Thu May 2 01:38:23 2019 From: lxc-bot at linuxcontainers.org (hallyn on Github) Date: Wed, 01 May 2019 18:38:23 -0700 (PDT) Subject: [lxc-devel] [lxc/master] namespaces: allow a pathname to a nsfd for namespace to share Message-ID: <5cca4a0f.1c69fb81.bc003.964cSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 529 bytes Desc: not available URL: -------------- next part -------------- From 080ddd313252382175440a1c9255ddd5275ddd8a Mon Sep 17 00:00:00 2001 From: Serge Hallyn Date: Wed, 1 May 2019 18:17:23 -0700 Subject: [PATCH] namespaces: allow a pathname to a nsfd for namespace to share Signed-off-by: Serge Hallyn --- src/lxc/confile_utils.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/lxc/confile_utils.c b/src/lxc/confile_utils.c index 5bceb96bdc..d405283836 100644 --- a/src/lxc/confile_utils.c +++ b/src/lxc/confile_utils.c @@ -865,6 +865,10 @@ int lxc_inherit_namespace(const char *lxcname_or_pid, const char *lxcpath, int fd, pid; char *dup, *lastslash; + if (lxcname_or_pid[0] == '/') { + return open(lxcname_or_pid, O_RDONLY | O_CLOEXEC); + } + lastslash = strrchr(lxcname_or_pid, '/'); if (lastslash) { dup = strdup(lxcname_or_pid); From lxc-bot at linuxcontainers.org Thu May 2 10:19:06 2019 From: lxc-bot at linuxcontainers.org (ajkavanagh on Github) Date: Thu, 02 May 2019 03:19:06 -0700 (PDT) Subject: [lxc-devel] [pylxd/master] Add live parameter for migrations Message-ID: <5ccac41a.1c69fb81.9ad95.2c4dSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 550 bytes Desc: not available URL: -------------- next part -------------- From 5c4663deef35c3041b2981580cf0903a5f3c7051 Mon Sep 17 00:00:00 2001 From: Alex Kavanagh Date: Thu, 2 May 2019 11:15:43 +0100 Subject: [PATCH] Add live parameter for migrations The LXD api provides a parameter to kick of live migrations. See https://github.com/lxc/lxd/blob/master/doc/rest-api.md#post-optional-targetmember-1 for further details. Closes: #338 Signed-off-by: Alex Kavanagh --- doc/source/containers.rst | 10 ++++++- pylxd/models/container.py | 42 ++++++++++++++++++++++------ pylxd/tests/models/test_container.py | 7 +++-- 3 files changed, 47 insertions(+), 12 deletions(-) diff --git a/doc/source/containers.rst b/doc/source/containers.rst index 48d5ff92..00ec7070 100644 --- a/doc/source/containers.rst +++ b/doc/source/containers.rst @@ -68,7 +68,9 @@ Container methods an url to an interactive websocket and the execution only starts after a client connected to the websocket. - `migrate` - Migrate the container. The first argument is a client connection to the destination server. This call is asynchronous, so - `wait=True` is optional. The container on the new client is returned. + ``wait=True`` is optional. The container on the new client is returned. If + ``live=True`` is passed to the function call, then the container is live + migrated (see the LXD documentation for further details). - `publish` - Publish the container as an image. Note the container must be stopped in order to use this method. If `wait=True` is passed, then the image is returned. @@ -165,6 +167,12 @@ the source server has to be reachable by the destination server otherwise the mi This will migrate the container from source server to destination server +To migrate a live container, user the ``live=True`` parameter: + +..code-block:: python + + cont.migrate(client__destination, live=True, wait=True) + If you want an interactive shell in the container, you can attach to it via a websocket. .. code-block:: python diff --git a/pylxd/models/container.py b/pylxd/models/container.py index d6dc4694..856ec291 100644 --- a/pylxd/models/container.py +++ b/pylxd/models/container.py @@ -475,15 +475,31 @@ def raw_interactive_execute(self, commands, environment=None): return {'ws': '{}?secret={}'.format(parsed.path, fds['0']), 'control': '{}?secret={}'.format(parsed.path, fds['control'])} - def migrate(self, new_client, wait=False): + def migrate(self, new_client, live=False, wait=False): """Migrate a container. Destination host information is contained in the client connection passed in. - If the container is running, it either must be shut down - first or criu must be installed on the source and destination - machines. + If the `live` param is True, then a live migration is attempted, + otherwise a non live one is running. + + If the container is running for live migration, it either must be shut + down first or criu must be installed on the source and destination + machines and the `live` param should be True. + + :param new_client: the pylxd client connection to migrate the container + to. + :type new_client: :class:`pylxd.client.Client` + :param live: whether to perform a live migration + :type live: bool + :param wait: if True, wait for the migration to complete + :type wait: bool + :raises: LXDAPIException if any of the API calls fail. + :raises: ValueError if source of target are local connections + :returns: the response from LXD of the new container (the target of the + migration and not the operation if waited on.) + :rtype: :class:`requests.Response` """ if self.api.scheme in ('http+unix',): raise ValueError('Cannot migrate from a local client connection') @@ -491,7 +507,7 @@ def migrate(self, new_client, wait=False): if self.status_code == 103: try: res = new_client.containers.create( - self.generate_migration_data(), wait=wait) + self.generate_migration_data(live), wait=wait) except LXDAPIException as e: if e.response.status_code == 103: self.delete() @@ -500,19 +516,29 @@ def migrate(self, new_client, wait=False): raise e else: res = new_client.containers.create( - self.generate_migration_data(), wait=wait) + self.generate_migration_data(live), wait=wait) self.delete() return res - def generate_migration_data(self): + def generate_migration_data(self, live=False): """Generate the migration data. This method can be used to handle migrations where the client connection uses the local unix socket. For more information on migration, see `Container.migrate`. + + :param live: Whether to include "live": "true" in the migration + :type live: bool + :raises: LXDAPIException if the request to migrate fails + :returns: dictionary of migration data suitable to send to an new + client to complete a migration. + :rtype: Dict[str, ANY] """ self.sync() # Make sure the object isn't stale - response = self.api.post(json={'migration': True}) + _json = {'migration': True} + if live: + _json['live'] = True + response = self.api.post(json=_json) operation = self.client.operations.get(response.json()['operation']) operation_url = self.client.api.operations[operation.id]._api_endpoint secrets = response.json()['metadata']['metadata'] diff --git a/pylxd/tests/models/test_container.py b/pylxd/tests/models/test_container.py index 795bebdc..f8447f7c 100644 --- a/pylxd/tests/models/test_container.py +++ b/pylxd/tests/models/test_container.py @@ -285,7 +285,7 @@ def test_migrate_exception_error(self, generate_migration_data): from pylxd.client import Client from pylxd.exceptions import LXDAPIException - def generate_exception(): + def generate_exception(*args, **kwargs): response = mock.Mock() response.status_code = 400 raise LXDAPIException(response) @@ -309,17 +309,18 @@ def test_migrate_exception_running(self, generate_migration_data): self.client, name='an-container') an_container.status_code = 103 - def generate_exception(): + def generate_exception(*args, **kwargs): response = mock.Mock() response.status_code = 103 raise LXDAPIException(response) generate_migration_data.side_effect = generate_exception - an_migrated_container = an_container.migrate(client2) + an_migrated_container = an_container.migrate(client2, live=True) self.assertEqual('an-container', an_migrated_container.name) self.assertEqual(client2, an_migrated_container.client) + generate_migration_data.assert_called_once_with(True) def test_migrate_started(self): """A container is migrated.""" From lxc-bot at linuxcontainers.org Thu May 2 11:25:43 2019 From: lxc-bot at linuxcontainers.org (ajkavanagh on Github) Date: Thu, 02 May 2019 04:25:43 -0700 (PDT) Subject: [lxc-devel] [pylxd/master] Fix dropped timeout in pylxd/client.py Message-ID: <5ccad3b7.1c69fb81.82e9f.1b16SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 610 bytes Desc: not available URL: -------------- next part -------------- From 28b31cc21fb4ac4e9e073864b635b3818e4ae3f7 Mon Sep 17 00:00:00 2001 From: Alex Kavanagh Date: Thu, 2 May 2019 12:23:28 +0100 Subject: [PATCH] Fix dropped timeout in pylxd/client.py When constructing the call, the [next-part] method creates the next element in the chain of /part/. However, it doesn't pass through the timeout parameter which means that it doesn't end up on the final call. This patch fixes that. Signed-off-by: Alex Kavanagh --- pylxd/client.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/pylxd/client.py b/pylxd/client.py index 3ccd3a39..62d32976 100644 --- a/pylxd/client.py +++ b/pylxd/client.py @@ -85,7 +85,8 @@ def __getattr__(self, name): name = 'storage-pools' return self.__class__('{}/{}'.format(self._api_endpoint, name), cert=self.session.cert, - verify=self.session.verify) + verify=self.session.verify, + timeout=self._timeout) def __getitem__(self, item): """This converts python api.thing[name] -> ".../thing/name" From lxc-bot at linuxcontainers.org Thu May 2 14:50:30 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Thu, 02 May 2019 07:50:30 -0700 (PDT) Subject: [lxc-devel] [lxd/master] network: Adds IPVLAN support Message-ID: <5ccb03b6.1c69fb81.5fc5c.8eceSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 357 bytes Desc: not available URL: -------------- next part -------------- From 30f7fd82fa53e6ab27a0d42d99fd3fce51a516a0 Mon Sep 17 00:00:00 2001 From: tomponline Date: Thu, 2 May 2019 15:45:26 +0100 Subject: [PATCH] doc: ipvlan docs Signed-off-by: tomponline --- doc/api-extensions.md | 3 +++ doc/containers.md | 52 +++++++++++++++++++++++-------------------- 2 files changed, 31 insertions(+), 24 deletions(-) diff --git a/doc/api-extensions.md b/doc/api-extensions.md index 67a93d82c5..0edcd818b9 100644 --- a/doc/api-extensions.md +++ b/doc/api-extensions.md @@ -752,3 +752,6 @@ Adds support for RBAC (role based access control). This introduces new config ke This makes it possible to do a normal "POST /1.0/containers" to copy a container between cluster nodes with LXD internally detecting whether a migration is required. + +## container\_nic\_ipvlan +This introduces the `ipvlan` "nic" device type. diff --git a/doc/containers.md b/doc/containers.md index 9824d41839..a583b3ebc1 100644 --- a/doc/containers.md +++ b/doc/containers.md @@ -231,36 +231,37 @@ LXD supports different kind of network devices: - `physical`: Straight physical device passthrough from the host. The targeted device will vanish from the host and appear in the container. - `bridged`: Uses an existing bridge on the host and creates a virtual device pair to connect the host bridge to the container. - `macvlan`: Sets up a new network device based on an existing one but using a different MAC address. + - `ipvlan`: Sets up a new network device based on an existing one using the same MAC address but a different IP. - `p2p`: Creates a virtual device pair, putting one side in the container and leaving the other side on the host. - `sriov`: Passes a virtual function of an SR-IOV enabled physical network device into the container. Different network interface types have different additional properties, the current list is: -Key | Type | Default | Required | Used by | API extension | Description -:-- | :-- | :-- | :-- | :-- | :-- | :-- -nictype | string | - | yes | all | - | The device type, one of "bridged", "macvlan", "p2p", "physical", or "sriov" -limits.ingress | string | - | no | bridged, p2p | - | I/O limit in bit/s for incoming traffic (various suffixes supported, see below) -limits.egress | string | - | no | bridged, p2p | - | I/O limit in bit/s for outgoing traffic (various suffixes supported, see below) -limits.max | string | - | no | bridged, p2p | - | Same as modifying both limits.ingress and limits.egress -name | string | kernel assigned | no | all | - | The name of the interface inside the container -host\_name | string | randomly assigned | no | bridged, p2p | - | The name of the interface inside the host -hwaddr | string | randomly assigned | no | all | - | The MAC address of the new interface -mtu | integer | parent MTU | no | all | - | The MTU of the new interface -parent | string | - | yes | bridged, macvlan, physical, sriov | - | The name of the host device or bridge -vlan | integer | - | no | macvlan, physical | network\_vlan, network\_vlan\_physical | The VLAN ID to attach to -ipv4.address | string | - | no | bridged | network | An IPv4 address to assign to the container through DHCP -ipv6.address | string | - | no | bridged | network | An IPv6 address to assign to the container through DHCP -ipv4.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv4 static routes to add on host to nic -ipv6.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv6 static routes to add on host to nic -security.mac\_filtering | boolean | false | no | bridged | network | Prevent the container from spoofing another's MAC address -maas.subnet.ipv4 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv4 subnet to register the container in -maas.subnet.ipv6 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv6 subnet to register the container in - -#### bridged or macvlan for connection to physical network -The `bridged` and `macvlan` interface types can both be used to connect +Key | Type | Default | Required | Used by | API extension | Description +:-- | :-- | :-- | :-- | :-- | :-- | :-- +nictype | string | - | yes | all | - | The device type, one of "bridged", "macvlan", "ipvlan", "p2p", "physical", or "sriov" +limits.ingress | string | - | no | bridged, p2p | - | I/O limit in bit/s for incoming traffic (various suffixes supported, see below) +limits.egress | string | - | no | bridged, p2p | - | I/O limit in bit/s for outgoing traffic (various suffixes supported, see below) +limits.max | string | - | no | bridged, p2p | - | Same as modifying both limits.ingress and limits.egress +name | string | kernel assigned | no | all | - | The name of the interface inside the container +host\_name | string | randomly assigned | no | bridged, p2p | - | The name of the interface inside the host +hwaddr | string | randomly assigned | no | bridged, macvlan, physical, sriov | - | The MAC address of the new interface +mtu | integer | parent MTU | no | all | - | The MTU of the new interface +parent | string | - | yes | bridged, macvlan, ipvlan, physical, sriov | - | The name of the host device or bridge +vlan | integer | - | no | macvlan, ipvlan, physical | network\_vlan, network\_vlan\_physical | The VLAN ID to attach to +ipv4.address | string | - | no | bridged, ipvlan | network | An IPv4 address to assign to the container through DHCP (bridged) and statically (ipvlan) +ipv6.address | string | - | no | bridged, ipvlan | network | An IPv6 address to assign to the container through DHCP (bridged) and statically (ipvlan) +ipv4.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv4 static routes to add on host to nic +ipv6.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv6 static routes to add on host to nic +security.mac\_filtering | boolean | false | no | bridged | network | Prevent the container from spoofing another's MAC address +maas.subnet.ipv4 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv4 subnet to register the container in +maas.subnet.ipv6 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv6 subnet to register the container in + +#### bridged, macvlan or ipvlan for connection to physical network +The `bridged`, `macvlan` and `ipvlan` interface types can both be used to connect to an existing physical network. -macvlan effectively lets you fork your physical NIC, getting a second +`macvlan` effectively lets you fork your physical NIC, getting a second interface that's then used by the container. This saves you from creating a bridge device and veth pairs and usually offers better performance than a bridge. @@ -273,6 +274,9 @@ your containers to talk to the host itself. In such case, a bridge is preferable. A bridge will also let you use mac filtering and I/O limits which cannot be applied to a macvlan device. +`ipvlan` is similar to `macvlan`, with the difference being that the forked device has IPs +statically assigned to it and inherits the parent's MAC address on the network. + #### SR-IOV The `sriov` interface type supports SR-IOV enabled network devices. These devices associate a set of virtual functions (VFs) with the single physical @@ -595,7 +599,7 @@ empty (default), no snapshots will be created. `snapshots.schedule.stopped` controls whether or not stopped container are to be automatically snapshotted. It defaults to `false`. `snapshots.pattern` takes a pongo2 template string, and the pongo2 context contains the `creation_date` variable. Be aware that you -should format the date (e.g. use `{{ creation_date|date:"2006-01-02_15-04-05" }}`) +should format the date (e.g. use `{{ creation_date|date:"2006-01-02_15-04-05" }}`) in your template string to avoid forbidden characters in your snapshot name. Another way to avoid name collisions is to use the placeholder `%d`. If a snapshot with the same name (excluding the placeholder) already exists, all existing snapshot From lxc-bot at linuxcontainers.org Thu May 2 15:07:17 2019 From: lxc-bot at linuxcontainers.org (monstermunchkin on Github) Date: Thu, 02 May 2019 08:07:17 -0700 (PDT) Subject: [lxc-devel] [lxd/master] Storage cleanup Message-ID: <5ccb07a5.1c69fb81.8b385.67f4SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From 26d6a52c7a246348da9b69a3ddf3efae187198c4 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 14:46:17 +0200 Subject: [PATCH 01/15] lxd: Add pretty logging function Signed-off-by: Thomas Hipp --- lxd/logging.go | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/lxd/logging.go b/lxd/logging.go index ab244ad5ca..3c16db0a4e 100644 --- a/lxd/logging.go +++ b/lxd/logging.go @@ -115,7 +115,7 @@ func expireLogs(ctx context.Context, state *state.State) error { if logfile.IsDir() { newest := newestFile(path, logfile) if time.Since(newest).Hours() >= 48 { - os.RemoveAll(path) + err := os.RemoveAll(path) if err != nil { return err } @@ -147,3 +147,19 @@ func expireLogs(ctx context.Context, state *state.State) error { return nil } + +func logAction(infoMsg, successMsg, errorMsg string, ctx *log.Ctx, success *bool, err *error) func() { + log.Info(infoMsg, ctx) + + return func() { + if *success { + log.Info(successMsg, ctx) + } else { + if (*err) != nil { + (*ctx)["error"] = (*err).Error() + } + + log.Error(errorMsg, ctx) + } + } +} From 31945bb83af8f2a8596a13c5b14342c0d410b557 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 14:51:19 +0200 Subject: [PATCH 02/15] storage: Remove shared code from backends This removes common code from the storage backends to the shared code section. Signed-off-by: Thomas Hipp --- lxd/storage_btrfs.go | 29 ----------------------------- lxd/storage_ceph.go | 29 ----------------------------- lxd/storage_dir.go | 29 ----------------------------- lxd/storage_lvm.go | 30 +----------------------------- lxd/storage_shared.go | 28 ++++++++++++++++++++++++++++ lxd/storage_zfs.go | 29 ----------------------------- 6 files changed, 29 insertions(+), 145 deletions(-) diff --git a/lxd/storage_btrfs.go b/lxd/storage_btrfs.go index 3d44e04fff..60b44d12c8 100644 --- a/lxd/storage_btrfs.go +++ b/lxd/storage_btrfs.go @@ -18,7 +18,6 @@ import ( "github.com/lxc/lxd/lxd/db" "github.com/lxc/lxd/lxd/migration" - "github.com/lxc/lxd/lxd/state" "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" @@ -536,14 +535,6 @@ func (s *storageBtrfs) StoragePoolUpdate(writable *api.StoragePoolPut, return nil } -func (s *storageBtrfs) GetStoragePoolWritable() api.StoragePoolPut { - return s.pool.Writable() -} - -func (s *storageBtrfs) SetStoragePoolWritable(writable *api.StoragePoolPut) { - s.pool.StoragePoolPut = *writable -} - func (s *storageBtrfs) GetContainerPoolInfo() (int64, string, string) { return s.poolID, s.pool.Name, s.pool.Name } @@ -805,14 +796,6 @@ func (s *storageBtrfs) StoragePoolVolumeRename(newName string) error { return nil } -func (s *storageBtrfs) GetStoragePoolVolumeWritable() api.StorageVolumePut { - return s.volume.Writable() -} - -func (s *storageBtrfs) SetStoragePoolVolumeWritable(writable *api.StorageVolumePut) { - s.volume.StorageVolumePut = *writable -} - // Functions dealing with container storage. func (s *storageBtrfs) ContainerStorageReady(container container) bool { containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) @@ -3098,18 +3081,6 @@ func (s *storageBtrfs) StorageMigrationSink(conn *websocket.Conn, op *operation, return rsyncStorageMigrationSink(conn, op, args) } -func (s *storageBtrfs) GetStoragePool() *api.StoragePool { - return s.pool -} - -func (s *storageBtrfs) GetStoragePoolVolume() *api.StorageVolume { - return s.volume -} - -func (s *storageBtrfs) GetState() *state.State { - return s.s -} - func (s *storageBtrfs) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { logger.Infof("Creating BTRFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) diff --git a/lxd/storage_ceph.go b/lxd/storage_ceph.go index 9e0be80504..c5ffe0e71c 100644 --- a/lxd/storage_ceph.go +++ b/lxd/storage_ceph.go @@ -14,7 +14,6 @@ import ( "github.com/pkg/errors" "github.com/lxc/lxd/lxd/db" - "github.com/lxc/lxd/lxd/state" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" "github.com/lxc/lxd/shared/ioprogress" @@ -314,22 +313,6 @@ func (s *storageCeph) StoragePoolUmount() (bool, error) { return true, nil } -func (s *storageCeph) GetStoragePoolWritable() api.StoragePoolPut { - return s.pool.StoragePoolPut -} - -func (s *storageCeph) GetStoragePoolVolumeWritable() api.StorageVolumePut { - return s.volume.Writable() -} - -func (s *storageCeph) SetStoragePoolWritable(writable *api.StoragePoolPut) { - s.pool.StoragePoolPut = *writable -} - -func (s *storageCeph) SetStoragePoolVolumeWritable(writable *api.StorageVolumePut) { - s.volume.StorageVolumePut = *writable -} - func (s *storageCeph) GetContainerPoolInfo() (int64, string, string) { return s.poolID, s.pool.Name, s.OSDPoolName } @@ -2729,18 +2712,6 @@ func (s *storageCeph) StorageMigrationSink(conn *websocket.Conn, op *operation, return rsyncStorageMigrationSink(conn, op, args) } -func (s *storageCeph) GetStoragePool() *api.StoragePool { - return s.pool -} - -func (s *storageCeph) GetStoragePoolVolume() *api.StorageVolume { - return s.volume -} - -func (s *storageCeph) GetState() *state.State { - return s.s -} - func (s *storageCeph) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { logger.Debugf("Creating RBD storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) sourcePath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) diff --git a/lxd/storage_dir.go b/lxd/storage_dir.go index b7635fd007..af0a22b890 100644 --- a/lxd/storage_dir.go +++ b/lxd/storage_dir.go @@ -13,7 +13,6 @@ import ( "github.com/pkg/errors" "github.com/lxc/lxd/lxd/migration" - "github.com/lxc/lxd/lxd/state" "github.com/lxc/lxd/lxd/storage/quota" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" @@ -277,22 +276,6 @@ func (s *storageDir) StoragePoolUmount() (bool, error) { return true, nil } -func (s *storageDir) GetStoragePoolWritable() api.StoragePoolPut { - return s.pool.Writable() -} - -func (s *storageDir) GetStoragePoolVolumeWritable() api.StorageVolumePut { - return s.volume.Writable() -} - -func (s *storageDir) SetStoragePoolWritable(writable *api.StoragePoolPut) { - s.pool.StoragePoolPut = *writable -} - -func (s *storageDir) SetStoragePoolVolumeWritable(writable *api.StorageVolumePut) { - s.volume.StorageVolumePut = *writable -} - func (s *storageDir) GetContainerPoolInfo() (int64, string, string) { return s.poolID, s.pool.Name, s.pool.Name } @@ -1446,18 +1429,6 @@ func (s *storageDir) StorageMigrationSink(conn *websocket.Conn, op *operation, a return rsyncStorageMigrationSink(conn, op, args) } -func (s *storageDir) GetStoragePool() *api.StoragePool { - return s.pool -} - -func (s *storageDir) GetStoragePoolVolume() *api.StorageVolume { - return s.volume -} - -func (s *storageDir) GetState() *state.State { - return s.s -} - func (s *storageDir) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { logger.Infof("Creating DIR storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) diff --git a/lxd/storage_lvm.go b/lxd/storage_lvm.go index 778c114e52..e4c8e4c444 100644 --- a/lxd/storage_lvm.go +++ b/lxd/storage_lvm.go @@ -14,7 +14,7 @@ import ( "github.com/pkg/errors" "github.com/lxc/lxd/lxd/migration" - "github.com/lxc/lxd/lxd/state" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" "github.com/lxc/lxd/shared/ioprogress" @@ -679,22 +679,6 @@ func (s *storageLvm) StoragePoolVolumeUmount() (bool, error) { return ourUmount, nil } -func (s *storageLvm) GetStoragePoolWritable() api.StoragePoolPut { - return s.pool.Writable() -} - -func (s *storageLvm) GetStoragePoolVolumeWritable() api.StorageVolumePut { - return s.volume.Writable() -} - -func (s *storageLvm) SetStoragePoolWritable(writable *api.StoragePoolPut) { - s.pool.StoragePoolPut = *writable -} - -func (s *storageLvm) SetStoragePoolVolumeWritable(writable *api.StorageVolumePut) { - s.volume.StorageVolumePut = *writable -} - func (s *storageLvm) GetContainerPoolInfo() (int64, string, string) { return s.poolID, s.pool.Name, s.getOnDiskPoolName() } @@ -2256,18 +2240,6 @@ func (s *storageLvm) StorageMigrationSink(conn *websocket.Conn, op *operation, a return rsyncStorageMigrationSink(conn, op, args) } -func (s *storageLvm) GetStoragePool() *api.StoragePool { - return s.pool -} - -func (s *storageLvm) GetStoragePoolVolume() *api.StorageVolume { - return s.volume -} - -func (s *storageLvm) GetState() *state.State { - return s.s -} - func (s *storageLvm) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { logger.Debugf("Creating LVM storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) diff --git a/lxd/storage_shared.go b/lxd/storage_shared.go index 8fb89b2da6..11875f06ec 100644 --- a/lxd/storage_shared.go +++ b/lxd/storage_shared.go @@ -30,6 +30,34 @@ func (s *storageShared) GetStorageTypeVersion() string { return s.sTypeVersion } +func (s *storageShared) GetStoragePool() *api.StoragePool { + return s.pool +} + +func (s *storageShared) GetStoragePoolVolume() *api.StorageVolume { + return s.volume +} + +func (s *storageShared) GetState() *state.State { + return s.s +} + +func (s *storageShared) GetStoragePoolWritable() api.StoragePoolPut { + return s.pool.Writable() +} + +func (s *storageShared) GetStoragePoolVolumeWritable() api.StorageVolumePut { + return s.volume.Writable() +} + +func (s *storageShared) SetStoragePoolWritable(writable *api.StoragePoolPut) { + s.pool.StoragePoolPut = *writable +} + +func (s *storageShared) SetStoragePoolVolumeWritable(writable *api.StorageVolumePut) { + s.volume.StorageVolumePut = *writable +} + func (s *storageShared) createImageDbPoolVolume(fingerprint string) error { // Fill in any default volume config. volumeConfig := map[string]string{} diff --git a/lxd/storage_zfs.go b/lxd/storage_zfs.go index 5667c557ae..cc35cd298a 100644 --- a/lxd/storage_zfs.go +++ b/lxd/storage_zfs.go @@ -15,7 +15,6 @@ import ( "github.com/pkg/errors" "github.com/lxc/lxd/lxd/migration" - "github.com/lxc/lxd/lxd/state" "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" @@ -620,22 +619,6 @@ func (s *storageZfs) StoragePoolVolumeUmount() (bool, error) { return ourUmount, nil } -func (s *storageZfs) GetStoragePoolWritable() api.StoragePoolPut { - return s.pool.Writable() -} - -func (s *storageZfs) GetStoragePoolVolumeWritable() api.StorageVolumePut { - return s.volume.Writable() -} - -func (s *storageZfs) SetStoragePoolWritable(writable *api.StoragePoolPut) { - s.pool.StoragePoolPut = *writable -} - -func (s *storageZfs) SetStoragePoolVolumeWritable(writable *api.StorageVolumePut) { - s.volume.StorageVolumePut = *writable -} - func (s *storageZfs) GetContainerPoolInfo() (int64, string, string) { return s.poolID, s.pool.Name, s.getOnDiskPoolName() } @@ -3375,18 +3358,6 @@ func (s *storageZfs) StorageMigrationSink(conn *websocket.Conn, op *operation, a return rsyncStorageMigrationSink(conn, op, args) } -func (s *storageZfs) GetStoragePool() *api.StoragePool { - return s.pool -} - -func (s *storageZfs) GetStoragePoolVolume() *api.StorageVolume { - return s.volume -} - -func (s *storageZfs) GetState() *state.State { - return s.s -} - func (s *storageZfs) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { logger.Infof("Creating ZFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) From d3e563dc269ccad9aad7ff6603bbe804937fa230 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 14:55:54 +0200 Subject: [PATCH 03/15] lxd: Remove ContainerCanRestore from storage interface The function ContainerCanRestore is only used by zfs, and therefore should be zfs specific. Signed-off-by: Thomas Hipp --- lxd/container_lxc.go | 6 ------ lxd/storage.go | 1 - lxd/storage_btrfs.go | 4 ---- lxd/storage_ceph.go | 4 ---- lxd/storage_dir.go | 4 ---- lxd/storage_lvm.go | 4 ---- lxd/storage_mock.go | 4 ---- lxd/storage_zfs.go | 48 ++++++++++++++++++-------------------------- 8 files changed, 19 insertions(+), 56 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 68b238e9c6..d44ccfc872 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -3370,12 +3370,6 @@ func (c *containerLXC) Restore(sourceContainer container, stateful bool) error { defer c.StorageStop() } - // Check if we can restore the container - err = c.storage.ContainerCanRestore(c, sourceContainer) - if err != nil { - return err - } - /* let's also check for CRIU if necessary, before doing a bunch of * filesystem manipulations */ diff --git a/lxd/storage.go b/lxd/storage.go index 2e07d53039..c825b146fb 100644 --- a/lxd/storage.go +++ b/lxd/storage.go @@ -175,7 +175,6 @@ type storage interface { // ContainerCreateFromImage creates a container from a image. ContainerCreateFromImage(c container, fingerprint string, tracker *ioprogress.ProgressTracker) error - ContainerCanRestore(target container, source container) error ContainerDelete(c container) error ContainerCopy(target container, source container, containerOnly bool) error ContainerRefresh(target container, source container, snapshots []container) error diff --git a/lxd/storage_btrfs.go b/lxd/storage_btrfs.go index 60b44d12c8..48553ec478 100644 --- a/lxd/storage_btrfs.go +++ b/lxd/storage_btrfs.go @@ -935,10 +935,6 @@ func (s *storageBtrfs) ContainerCreateFromImage(container container, fingerprint return nil } -func (s *storageBtrfs) ContainerCanRestore(container container, sourceContainer container) error { - return nil -} - func (s *storageBtrfs) ContainerDelete(container container) error { logger.Debugf("Deleting BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) diff --git a/lxd/storage_ceph.go b/lxd/storage_ceph.go index c5ffe0e71c..02d89e8bd5 100644 --- a/lxd/storage_ceph.go +++ b/lxd/storage_ceph.go @@ -958,10 +958,6 @@ func (s *storageCeph) ContainerCreateFromImage(container container, fingerprint return nil } -func (s *storageCeph) ContainerCanRestore(container container, sourceContainer container) error { - return nil -} - func (s *storageCeph) ContainerDelete(container container) error { containerName := container.Name() logger.Debugf(`Deleting RBD storage volume for container "%s" on storage pool "%s"`, containerName, s.pool.Name) diff --git a/lxd/storage_dir.go b/lxd/storage_dir.go index af0a22b890..788b4a0245 100644 --- a/lxd/storage_dir.go +++ b/lxd/storage_dir.go @@ -565,10 +565,6 @@ func (s *storageDir) ContainerCreateFromImage(container container, imageFingerpr return nil } -func (s *storageDir) ContainerCanRestore(container container, sourceContainer container) error { - return nil -} - func (s *storageDir) ContainerDelete(container container) error { logger.Debugf("Deleting DIR storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) diff --git a/lxd/storage_lvm.go b/lxd/storage_lvm.go index e4c8e4c444..fae1d97a37 100644 --- a/lxd/storage_lvm.go +++ b/lxd/storage_lvm.go @@ -1061,10 +1061,6 @@ func (s *storageLvm) ContainerCreateFromImage(container container, fingerprint s return nil } -func (s *storageLvm) ContainerCanRestore(container container, sourceContainer container) error { - return nil -} - func lvmContainerDeleteInternal(project, poolName string, ctName string, isSnapshot bool, vgName string, ctPath string) error { containerMntPoint := "" containerLvmName := containerNameToLVName(ctName) diff --git a/lxd/storage_mock.go b/lxd/storage_mock.go index 58c993f511..950aa4d215 100644 --- a/lxd/storage_mock.go +++ b/lxd/storage_mock.go @@ -123,10 +123,6 @@ func (s *storageMock) ContainerCreateFromImage( return nil } -func (s *storageMock) ContainerCanRestore(container container, sourceContainer container) error { - return nil -} - func (s *storageMock) ContainerDelete(container container) error { return nil } diff --git a/lxd/storage_zfs.go b/lxd/storage_zfs.go index cc35cd298a..d22092b8f1 100644 --- a/lxd/storage_zfs.go +++ b/lxd/storage_zfs.go @@ -913,29 +913,6 @@ func (s *storageZfs) ContainerCreateFromImage(container container, fingerprint s return nil } -func (s *storageZfs) ContainerCanRestore(container container, sourceContainer container) error { - snaps, err := container.Snapshots() - if err != nil { - return err - } - - if snaps[len(snaps)-1].Name() != sourceContainer.Name() { - if s.pool.Config["volume.zfs.remove_snapshots"] != "" { - zfsRemoveSnapshots = s.pool.Config["volume.zfs.remove_snapshots"] - } - if s.volume.Config["zfs.remove_snapshots"] != "" { - zfsRemoveSnapshots = s.volume.Config["zfs.remove_snapshots"] - } - if !shared.IsTrue(zfsRemoveSnapshots) { - return fmt.Errorf("ZFS can only restore from the latest snapshot. Delete newer snapshots or copy the snapshot into a new container instead") - } - - return nil - } - - return nil -} - func (s *storageZfs) ContainerDelete(container container) error { err := s.doContainerDelete(container.Project(), container.Name()) if err != nil { @@ -1501,6 +1478,25 @@ func (s *storageZfs) ContainerRename(container container, newName string) error func (s *storageZfs) ContainerRestore(target container, source container) error { logger.Debugf("Restoring ZFS storage volume for container \"%s\" from %s to %s", s.volume.Name, source.Name(), target.Name()) + snaps, err := target.Snapshots() + if err != nil { + return err + } + + if snaps[len(snaps)-1].Name() != source.Name() { + if s.pool.Config["volume.zfs.remove_snapshots"] != "" { + zfsRemoveSnapshots = s.pool.Config["volume.zfs.remove_snapshots"] + } + + if s.volume.Config["zfs.remove_snapshots"] != "" { + zfsRemoveSnapshots = s.volume.Config["zfs.remove_snapshots"] + } + + if !shared.IsTrue(zfsRemoveSnapshots) { + return fmt.Errorf("ZFS can only restore from the latest snapshot. Delete newer snapshots or copy the snapshot into a new container instead") + } + } + // Start storage for source container ourSourceStart, err := source.StorageStart() if err != nil { @@ -1519,12 +1515,6 @@ func (s *storageZfs) ContainerRestore(target container, source container) error defer target.StorageStop() } - // Remove any needed snapshot - snaps, err := target.Snapshots() - if err != nil { - return err - } - for i := len(snaps) - 1; i != 0; i-- { if snaps[i].Name() == source.Name() { break From a880d879723b8c5ace62aa6cfcd7f8fce6212d54 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 16:05:56 +0200 Subject: [PATCH 04/15] lxd: Remove Image{Umount,Mount} from storage interface These functions are not called from anywhere outside of the actual storage backend code. Signed-off-by: Thomas Hipp --- lxd/storage.go | 2 -- 1 file changed, 2 deletions(-) diff --git a/lxd/storage.go b/lxd/storage.go index c825b146fb..9df1e51632 100644 --- a/lxd/storage.go +++ b/lxd/storage.go @@ -201,8 +201,6 @@ type storage interface { // Functions dealing with image storage volumes. ImageCreate(fingerprint string, tracker *ioprogress.ProgressTracker) error ImageDelete(fingerprint string) error - ImageMount(fingerprint string) (bool, error) - ImageUmount(fingerprint string) (bool, error) // Storage type agnostic functions. StorageEntitySetQuota(volumeType int, size int64, data interface{}) error From 1e8855d03192577b3d43334065fb120d3b9e2e10 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 14:58:46 +0200 Subject: [PATCH 05/15] lxd: Add project argument to containerPath function Signed-off-by: Thomas Hipp --- lxd/api_internal.go | 4 ++-- lxd/container.go | 6 +++--- lxd/container_lxc.go | 3 +-- lxd/container_test.go | 4 ++-- lxd/storage_dir.go | 7 ++++--- lxd/storage_zfs.go | 2 +- 6 files changed, 13 insertions(+), 13 deletions(-) diff --git a/lxd/api_internal.go b/lxd/api_internal.go index 508801f243..fb47c4ef58 100644 --- a/lxd/api_internal.go +++ b/lxd/api_internal.go @@ -673,7 +673,7 @@ func internalImport(d *Daemon, r *http.Request) Response { onDiskPoolName = poolName } snapName := fmt.Sprintf("%s/%s", req.Name, od) - snapPath := containerPath(snapName, true) + snapPath := containerPath(project, snapName, true) err = lvmContainerDeleteInternal(project, poolName, req.Name, true, onDiskPoolName, snapPath) case "ceph": @@ -1015,7 +1015,7 @@ func internalImport(d *Daemon, r *http.Request) Response { return SmartError(err) } - containerPath := containerPath(projectPrefix(project, req.Name), false) + containerPath := containerPath(project, req.Name, false) isPrivileged := false if backup.Container.Config["security.privileged"] == "" { isPrivileged = true diff --git a/lxd/container.go b/lxd/container.go index 24da0f924d..940a95e617 100644 --- a/lxd/container.go +++ b/lxd/container.go @@ -45,12 +45,12 @@ func containerGetParentAndSnapshotName(name string) (string, string, bool) { return fields[0], fields[1], true } -func containerPath(name string, isSnapshot bool) string { +func containerPath(project string, name string, isSnapshot bool) string { if isSnapshot { - return shared.VarPath("snapshots", name) + return shared.VarPath("snapshots", projectPrefix(project, name)) } - return shared.VarPath("containers", name) + return shared.VarPath("containers", projectPrefix(project, name)) } func containerValidName(name string) error { diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index d44ccfc872..373e4ea0ba 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -8900,8 +8900,7 @@ func (c *containerLXC) State() string { // Various container paths func (c *containerLXC) Path() string { - name := projectPrefix(c.Project(), c.Name()) - return containerPath(name, c.IsSnapshot()) + return containerPath(c.Project(), c.Name(), c.IsSnapshot()) } func (c *containerLXC) DevicesPath() string { diff --git a/lxd/container_test.go b/lxd/container_test.go index caa3adebe0..6bfa73fc36 100644 --- a/lxd/container_test.go +++ b/lxd/container_test.go @@ -160,7 +160,7 @@ func (suite *containerTestSuite) TestContainer_Path_Regular() { suite.Req.False(c.IsSnapshot(), "Shouldn't be a snapshot.") suite.Req.Equal(shared.VarPath("containers", "testFoo"), c.Path()) - suite.Req.Equal(shared.VarPath("containers", "testFoo2"), containerPath("testFoo2", false)) + suite.Req.Equal(shared.VarPath("containers", "testFoo2"), containerPath("default", "testFoo2", false)) } func (suite *containerTestSuite) TestContainer_Path_Snapshot() { @@ -181,7 +181,7 @@ func (suite *containerTestSuite) TestContainer_Path_Snapshot() { c.Path()) suite.Req.Equal( shared.VarPath("snapshots", "test", "snap1"), - containerPath("test/snap1", true)) + containerPath("default", "test/snap1", true)) } func (suite *containerTestSuite) TestContainer_LogPath() { diff --git a/lxd/storage_dir.go b/lxd/storage_dir.go index 788b4a0245..ca1b0a6949 100644 --- a/lxd/storage_dir.go +++ b/lxd/storage_dir.go @@ -817,9 +817,10 @@ func (s *storageDir) ContainerRename(container container, newName string) error } oldContainerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - oldContainerSymlink := shared.VarPath("containers", projectPrefix(container.Project(), container.Name())) + oldContainerSymlink := containerPath(container.Project(), container.Name(), false) newContainerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, newName) - newContainerSymlink := shared.VarPath("containers", projectPrefix(container.Project(), newName)) + newContainerSymlink := containerPath(container.Project(), newName, false) + err = renameContainerMountpoint(oldContainerMntPoint, oldContainerSymlink, newContainerMntPoint, newContainerSymlink) if err != nil { return err @@ -1201,7 +1202,7 @@ func (s *storageDir) ContainerBackupLoad(info backupInfo, data io.ReadSeeker, ta // Create mountpoints containerMntPoint := getContainerMountPoint(info.Project, s.pool.Name, info.Name) - err = createContainerMountpoint(containerMntPoint, containerPath(projectPrefix(info.Project, info.Name), false), info.Privileged) + err = createContainerMountpoint(containerMntPoint, containerPath(info.Project, info.Name, false), info.Privileged) if err != nil { return errors.Wrap(err, "Create container mount point") } diff --git a/lxd/storage_zfs.go b/lxd/storage_zfs.go index d22092b8f1..f049deb595 100644 --- a/lxd/storage_zfs.go +++ b/lxd/storage_zfs.go @@ -2129,7 +2129,7 @@ func (s *storageZfs) ContainerBackupCreate(backup backup, source container) erro func (s *storageZfs) doContainerBackupLoadOptimized(info backupInfo, data io.ReadSeeker, tarArgs []string) error { containerName, _, _ := containerGetParentAndSnapshotName(info.Name) containerMntPoint := getContainerMountPoint(info.Project, s.pool.Name, containerName) - err := createContainerMountpoint(containerMntPoint, containerPath(info.Name, false), info.Privileged) + err := createContainerMountpoint(containerMntPoint, containerPath(info.Project, info.Name, false), info.Privileged) if err != nil { return err } From b562ab8b1091585104e3d02a9c0307f3fa13cba6 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 15:04:28 +0200 Subject: [PATCH 06/15] lxd: Move storage gco to storage package Signed-off-by: Thomas Hipp --- lxd/{ => storage}/storage_cgo.go | 12 +- lxd/storage/utils.go | 405 +++++++++++++++++++++++++++++++ lxd/storage_btrfs.go | 3 +- lxd/storage_lvm.go | 6 +- lxd/storage_utils.go | 3 +- 5 files changed, 418 insertions(+), 11 deletions(-) rename lxd/{ => storage}/storage_cgo.go (96%) create mode 100644 lxd/storage/utils.go diff --git a/lxd/storage_cgo.go b/lxd/storage/storage_cgo.go similarity index 96% rename from lxd/storage_cgo.go rename to lxd/storage/storage_cgo.go index 1f1c7136f7..b770710cb0 100644 --- a/lxd/storage_cgo.go +++ b/lxd/storage/storage_cgo.go @@ -1,7 +1,7 @@ // +build linux // +build cgo -package main +package storage /* #define _GNU_SOURCE @@ -19,8 +19,8 @@ package main #include #include -#include "include/macro.h" -#include "include/memory_utils.h" +#include "../include/macro.h" +#include "../include/memory_utils.h" #ifndef MS_LAZYTIME #define MS_LAZYTIME (1<<25) @@ -267,7 +267,7 @@ const MS_LAZYTIME uintptr = C.MS_LAZYTIME // prepareLoopDev() detects and sets up a loop device for source. It returns an // open file descriptor to the free loop device and the path of the free loop // device. It's the callers responsibility to close the open file descriptor. -func prepareLoopDev(source string, flags int) (*os.File, error) { +func PrepareLoopDev(source string, flags int) (*os.File, error) { cLoopDev := C.malloc(C.size_t(C.LO_NAME_SIZE)) if cLoopDev == nil { return nil, fmt.Errorf("Failed to allocate memory in C") @@ -293,7 +293,7 @@ func prepareLoopDev(source string, flags int) (*os.File, error) { return os.NewFile(uintptr(loopFd), C.GoString((*C.char)(cLoopDev))), nil } -func setAutoclearOnLoopDev(loopFd int) error { +func SetAutoclearOnLoopDev(loopFd int) error { ret, err := C.set_autoclear_loop_device(C.int(loopFd)) if ret < 0 { if err != nil { @@ -305,7 +305,7 @@ func setAutoclearOnLoopDev(loopFd int) error { return nil } -func unsetAutoclearOnLoopDev(loopFd int) error { +func UnsetAutoclearOnLoopDev(loopFd int) error { ret, err := C.unset_autoclear_loop_device(C.int(loopFd)) if ret < 0 { if err != nil { diff --git a/lxd/storage/utils.go b/lxd/storage/utils.go new file mode 100644 index 0000000000..2fa57b55fc --- /dev/null +++ b/lxd/storage/utils.go @@ -0,0 +1,405 @@ +package storage + +import ( + "fmt" + "os" + "strings" + "syscall" + "time" + + "github.com/lxc/lxd/lxd/db" + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/idmap" + "github.com/lxc/lxd/shared/logger" +) + +// Options for filesystem creation +type mkfsOptions struct { + label string +} + +// Export the mount options map since we might find it useful in other parts of +// LXD. +type mountOptions struct { + capture bool + flag uintptr +} + +var MountOptions = map[string]mountOptions{ + "async": {false, syscall.MS_SYNCHRONOUS}, + "atime": {false, syscall.MS_NOATIME}, + "bind": {true, syscall.MS_BIND}, + "defaults": {true, 0}, + "dev": {false, syscall.MS_NODEV}, + "diratime": {false, syscall.MS_NODIRATIME}, + "dirsync": {true, syscall.MS_DIRSYNC}, + "exec": {false, syscall.MS_NOEXEC}, + "lazytime": {true, MS_LAZYTIME}, + "mand": {true, syscall.MS_MANDLOCK}, + "noatime": {true, syscall.MS_NOATIME}, + "nodev": {true, syscall.MS_NODEV}, + "nodiratime": {true, syscall.MS_NODIRATIME}, + "noexec": {true, syscall.MS_NOEXEC}, + "nomand": {false, syscall.MS_MANDLOCK}, + "norelatime": {false, syscall.MS_RELATIME}, + "nostrictatime": {false, syscall.MS_STRICTATIME}, + "nosuid": {true, syscall.MS_NOSUID}, + "rbind": {true, syscall.MS_BIND | syscall.MS_REC}, + "relatime": {true, syscall.MS_RELATIME}, + "remount": {true, syscall.MS_REMOUNT}, + "ro": {true, syscall.MS_RDONLY}, + "rw": {false, syscall.MS_RDONLY}, + "strictatime": {true, syscall.MS_STRICTATIME}, + "suid": {false, syscall.MS_NOSUID}, + "sync": {true, syscall.MS_SYNCHRONOUS}, +} + +func lxdResolveMountoptions(options string) (uintptr, string) { + mountFlags := uintptr(0) + tmp := strings.SplitN(options, ",", -1) + for i := 0; i < len(tmp); i++ { + opt := tmp[i] + do, ok := MountOptions[opt] + if !ok { + continue + } + + if do.capture { + mountFlags |= do.flag + } else { + mountFlags &= ^do.flag + } + + copy(tmp[i:], tmp[i+1:]) + tmp[len(tmp)-1] = "" + tmp = tmp[:len(tmp)-1] + i-- + } + + return mountFlags, strings.Join(tmp, ",") +} + +// Useful functions for unreliable backends +func tryMount(src string, dst string, fs string, flags uintptr, options string) error { + var err error + + for i := 0; i < 20; i++ { + err = syscall.Mount(src, dst, fs, flags, options) + if err == nil { + break + } + + time.Sleep(500 * time.Millisecond) + } + + if err != nil { + return err + } + + return nil +} + +func tryUnmount(path string, flags int) error { + var err error + + for i := 0; i < 20; i++ { + err = syscall.Unmount(path, flags) + if err == nil { + break + } + + time.Sleep(500 * time.Millisecond) + } + + if err != nil && err == syscall.EBUSY { + return err + } + + return nil +} + +func storageValidName(value string) error { + if shared.IsSnapshot(value) { + return fmt.Errorf("Invalid storage volume name \"%s\". Storage volumes cannot contain \"/\" in their name", value) + } + + return nil +} + +func storageConfigDiff(oldConfig map[string]string, newConfig map[string]string) ([]string, bool) { + changedConfig := []string{} + userOnly := true + for key := range oldConfig { + if oldConfig[key] != newConfig[key] { + if !strings.HasPrefix(key, "user.") { + userOnly = false + } + + if !shared.StringInSlice(key, changedConfig) { + changedConfig = append(changedConfig, key) + } + } + } + + for key := range newConfig { + if oldConfig[key] != newConfig[key] { + if !strings.HasPrefix(key, "user.") { + userOnly = false + } + + if !shared.StringInSlice(key, changedConfig) { + changedConfig = append(changedConfig, key) + } + } + } + + // Skip on no change + if len(changedConfig) == 0 { + return nil, false + } + + return changedConfig, userOnly +} + +// Default permissions for folders in ${LXD_DIR} +const storagePoolsDirMode os.FileMode = 0711 +const containersDirMode os.FileMode = 0711 +const customDirMode os.FileMode = 0711 +const imagesDirMode os.FileMode = 0700 +const snapshotsDirMode os.FileMode = 0700 + +// Detect whether LXD already uses the given storage pool. +func lxdUsesPool(dbObj *db.Cluster, onDiskPoolName string, driver string, onDiskProperty string) (bool, string, error) { + pools, err := dbObj.StoragePools() + if err != nil && err != db.ErrNoSuchObject { + return false, "", err + } + + for _, pool := range pools { + _, pl, err := dbObj.StoragePoolGet(pool) + if err != nil { + continue + } + + if pl.Driver != driver { + continue + } + + if pl.Config[onDiskProperty] == onDiskPoolName { + return true, pl.Name, nil + } + } + + return false, "", nil +} + +func makeFSType(path string, fsType string, options *mkfsOptions) (string, error) { + var err error + var msg string + + fsOptions := options + if fsOptions == nil { + fsOptions = &mkfsOptions{} + } + + cmd := []string{fmt.Sprintf("mkfs.%s", fsType), path} + if fsOptions.label != "" { + cmd = append(cmd, "-L", fsOptions.label) + } + + if fsType == "ext4" { + cmd = append(cmd, "-E", "nodiscard,lazy_itable_init=0,lazy_journal_init=0") + } + + msg, err = shared.TryRunCommand(cmd[0], cmd[1:]...) + if err != nil { + return msg, err + } + + return "", nil +} + +func fsGenerateNewUUID(fstype string, lvpath string) (string, error) { + switch fstype { + case "btrfs": + return btrfsGenerateNewUUID(lvpath) + case "xfs": + return xfsGenerateNewUUID(lvpath) + } + + return "", nil +} + +func xfsGenerateNewUUID(lvpath string) (string, error) { + msg, err := shared.RunCommand( + "xfs_admin", + "-U", "generate", + lvpath) + if err != nil { + return msg, err + } + + return "", nil +} + +func btrfsGenerateNewUUID(lvpath string) (string, error) { + msg, err := shared.RunCommand( + "btrfstune", + "-f", + "-u", + lvpath) + if err != nil { + return msg, err + } + + return "", nil +} + +func growFileSystem(fsType string, devPath string, mntpoint string) error { + var msg string + var err error + switch fsType { + case "": // if not specified, default to ext4 + fallthrough + case "ext4": + msg, err = shared.TryRunCommand("resize2fs", devPath) + case "xfs": + msg, err = shared.TryRunCommand("xfs_growfs", devPath) + case "btrfs": + msg, err = shared.TryRunCommand("btrfs", "filesystem", "resize", "max", mntpoint) + default: + return fmt.Errorf(`Growing not supported for filesystem type "%s"`, fsType) + } + + if err != nil { + errorMsg := fmt.Sprintf(`Could not extend underlying %s filesystem for "%s": %s`, fsType, devPath, msg) + logger.Errorf(errorMsg) + return fmt.Errorf(errorMsg) + } + + logger.Debugf(`extended underlying %s filesystem for "%s"`, fsType, devPath) + return nil +} + +func shrinkFileSystem(fsType string, devPath string, mntpoint string, byteSize int64) error { + strSize := fmt.Sprintf("%dK", byteSize/1024) + + switch fsType { + case "": // if not specified, default to ext4 + fallthrough + case "ext4": + _, err := shared.TryRunCommand("e2fsck", "-f", "-y", devPath) + if err != nil { + return err + } + + _, err = shared.TryRunCommand("resize2fs", devPath, strSize) + if err != nil { + return err + } + case "btrfs": + _, err := shared.TryRunCommand("btrfs", "filesystem", "resize", strSize, mntpoint) + if err != nil { + return err + } + default: + return fmt.Errorf(`Shrinking not supported for filesystem type "%s"`, fsType) + } + + return nil +} + +/* +func shrinkVolumeFilesystem(s StorageDriver, volumeType int, fsType string, devPath string, mntpoint string, byteSize int64, data interface{}) (func() (bool, error), error) { + var cleanupFunc func() (bool, error) + switch fsType { + case "xfs": + logger.Errorf("XFS filesystems cannot be shrunk: dump, mkfs, and restore are required") + return nil, fmt.Errorf("xfs filesystems cannot be shrunk: dump, mkfs, and restore are required") + case "btrfs": + fallthrough + case "": // if not specified, default to ext4 + fallthrough + case "ext4": + switch volumeType { + case storagePoolVolumeTypeContainer: + c := data.(Container) + ourMount, err := c.StorageStop() + if err != nil { + return nil, err + } + if !ourMount { + cleanupFunc = c.StorageStart + } + case storagePoolVolumeTypeCustom: + ourMount, err := s.StoragePoolVolumeUmount() + if err != nil { + return nil, err + } + if !ourMount { + cleanupFunc = s.StoragePoolVolumeMount + } + default: + return nil, fmt.Errorf(`Resizing not implemented for storage volume type %d`, volumeType) + } + + default: + return nil, fmt.Errorf(`Shrinking not supported for filesystem type "%s"`, fsType) + } + + err := shrinkFileSystem(fsType, devPath, mntpoint, byteSize) + return cleanupFunc, err +} +*/ + +// Returns the parent container name, snapshot name, and whether it actually was +// a snapshot name. +func containerGetParentAndSnapshotName(name string) (string, string, bool) { + fields := strings.SplitN(name, shared.SnapshotDelimiter, 2) + if len(fields) == 1 { + return name, "", false + } + + return fields[0], fields[1], true +} + +// /var/lib/lxd/[snapshots|containers]/name +func containerPath(project string, name string, isSnapshot bool) string { + if isSnapshot { + return shared.VarPath("snapshots", projectPrefix(project, name)) + } + + return shared.VarPath("containers", projectPrefix(project, name)) +} + +func setUnprivUserACL(idmapset *idmap.IdmapSet, destPath string) error { + // Skip for privileged containers + if idmapset == nil { + return nil + } + + // Make sure the map is valid. Skip if container uid 0 == host uid 0 + uid, _ := idmapset.ShiftIntoNs(0, 0) + switch uid { + case -1: + return fmt.Errorf("Container doesn't have a uid 0 in its map") + case 0: + return nil + } + + // Attempt to set a POSIX ACL first. + acl := fmt.Sprintf("%d:rx", uid) + _, err := shared.RunCommand("setfacl", "-m", acl, destPath) + if err == nil { + return nil + } + + // Fallback to chmod if the fs doesn't support it. + _, err = shared.RunCommand("chmod", "+x", destPath) + if err != nil { + logger.Debugf("Failed to set executable bit on the container path: %s", err) + return err + } + + return nil +} diff --git a/lxd/storage_btrfs.go b/lxd/storage_btrfs.go index 48553ec478..113cd707d1 100644 --- a/lxd/storage_btrfs.go +++ b/lxd/storage_btrfs.go @@ -18,6 +18,7 @@ import ( "github.com/lxc/lxd/lxd/db" "github.com/lxc/lxd/lxd/migration" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" @@ -420,7 +421,7 @@ func (s *storageBtrfs) StoragePoolMount() (bool, error) { // Since we mount the loop device LO_FLAGS_AUTOCLEAR is // fine since the loop device will be kept around for as // long as the mount exists. - loopF, loopErr := prepareLoopDev(source, LoFlagsAutoclear) + loopF, loopErr := driver.PrepareLoopDev(source, driver.LoFlagsAutoclear) if loopErr != nil { return false, loopErr } diff --git a/lxd/storage_lvm.go b/lxd/storage_lvm.go index fae1d97a37..8be04d6193 100644 --- a/lxd/storage_lvm.go +++ b/lxd/storage_lvm.go @@ -386,7 +386,7 @@ func (s *storageLvm) StoragePoolDelete() error { if s.loopInfo != nil { // Set LO_FLAGS_AUTOCLEAR before we remove the loop file // otherwise we will get EBADF. - err = setAutoclearOnLoopDev(int(s.loopInfo.Fd())) + err = driver.SetAutoclearOnLoopDev(int(s.loopInfo.Fd())) if err != nil { logger.Warnf("Failed to set LO_FLAGS_AUTOCLEAR on loop device: %s, manual cleanup needed", err) } @@ -458,12 +458,12 @@ func (s *storageLvm) StoragePoolMount() (bool, error) { if filepath.IsAbs(source) && !shared.IsBlockdevPath(source) { // Try to prepare new loop device. - loopF, loopErr := prepareLoopDev(source, 0) + loopF, loopErr := driver.PrepareLoopDev(source, 0) if loopErr != nil { return false, loopErr } // Make sure that LO_FLAGS_AUTOCLEAR is unset. - loopErr = unsetAutoclearOnLoopDev(int(loopF.Fd())) + loopErr = driver.UnsetAutoclearOnLoopDev(int(loopF.Fd())) if loopErr != nil { return false, loopErr } diff --git a/lxd/storage_utils.go b/lxd/storage_utils.go index 23f0450c19..d7d650d414 100644 --- a/lxd/storage_utils.go +++ b/lxd/storage_utils.go @@ -8,6 +8,7 @@ import ( "time" "github.com/lxc/lxd/lxd/db" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" "github.com/lxc/lxd/shared/logger" @@ -34,7 +35,7 @@ var MountOptions = map[string]mountOptions{ "diratime": {false, syscall.MS_NODIRATIME}, "dirsync": {true, syscall.MS_DIRSYNC}, "exec": {false, syscall.MS_NOEXEC}, - "lazytime": {true, MS_LAZYTIME}, + "lazytime": {true, driver.MS_LAZYTIME}, "mand": {true, syscall.MS_MANDLOCK}, "noatime": {true, syscall.MS_NOATIME}, "nodev": {true, syscall.MS_NODEV}, From e6859ef1d3de5167e54fbc8dc3bbf21f2148b7ae Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 15:13:48 +0200 Subject: [PATCH 07/15] migration: Remove unused Snapshots() function from interface Signed-off-by: Thomas Hipp --- lxd/storage_btrfs.go | 4 ---- lxd/storage_migration.go | 7 ------- lxd/storage_zfs.go | 4 ---- 3 files changed, 15 deletions(-) diff --git a/lxd/storage_btrfs.go b/lxd/storage_btrfs.go index 113cd707d1..7fcab12506 100644 --- a/lxd/storage_btrfs.go +++ b/lxd/storage_btrfs.go @@ -2410,10 +2410,6 @@ type btrfsMigrationSourceDriver struct { stoppedSnapName string } -func (s *btrfsMigrationSourceDriver) Snapshots() []container { - return s.snapshots -} - func (s *btrfsMigrationSourceDriver) send(conn *websocket.Conn, btrfsPath string, btrfsParent string, readWrapper func(io.ReadCloser) io.ReadCloser) error { args := []string{"send"} if btrfsParent != "" { diff --git a/lxd/storage_migration.go b/lxd/storage_migration.go index 387f2bef6d..835ae95d24 100644 --- a/lxd/storage_migration.go +++ b/lxd/storage_migration.go @@ -17,9 +17,6 @@ import ( // MigrationStorageSourceDriver defines the functions needed to implement a // migration source driver. type MigrationStorageSourceDriver interface { - /* snapshots for this container, if any */ - Snapshots() []container - /* send any bits of the container/snapshots that are possible while the * container is still running. */ @@ -46,10 +43,6 @@ type rsyncStorageSourceDriver struct { rsyncFeatures []string } -func (s rsyncStorageSourceDriver) Snapshots() []container { - return s.snapshots -} - func (s rsyncStorageSourceDriver) SendStorageVolume(conn *websocket.Conn, op *operation, bwlimit string, storage storage, volumeOnly bool) error { ourMount, err := storage.StoragePoolVolumeMount() if err != nil { diff --git a/lxd/storage_zfs.go b/lxd/storage_zfs.go index f049deb595..93c60f13d0 100644 --- a/lxd/storage_zfs.go +++ b/lxd/storage_zfs.go @@ -2513,10 +2513,6 @@ type zfsMigrationSourceDriver struct { zfsFeatures []string } -func (s *zfsMigrationSourceDriver) Snapshots() []container { - return s.snapshots -} - func (s *zfsMigrationSourceDriver) send(conn *websocket.Conn, zfsName string, zfsParent string, readWrapper func(io.ReadCloser) io.ReadCloser) error { sourceParentName, _, _ := containerGetParentAndSnapshotName(s.container.Name()) poolName := s.zfs.getOnDiskPoolName() From 03926ec8fbacecbe5ee865790deaf05c30576c7d Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 15:17:33 +0200 Subject: [PATCH 08/15] lxd: Add common code to new storage package Signed-off-by: Thomas Hipp --- lxd/storage/lock.go | 109 +++++++++++++++++ lxd/storage/shared.go | 88 ++++++++++++++ lxd/storage/storage.go | 181 ++++++++++++++++++++++++++++ lxd/storage/storage_pools_config.go | 40 ++++++ lxd/storage/storage_pools_utils.go | 26 ++++ lxd/storage/volumes_utils.go | 26 ++++ 6 files changed, 470 insertions(+) create mode 100644 lxd/storage/lock.go create mode 100644 lxd/storage/shared.go create mode 100644 lxd/storage/storage.go create mode 100644 lxd/storage/storage_pools_config.go create mode 100644 lxd/storage/storage_pools_utils.go create mode 100644 lxd/storage/volumes_utils.go diff --git a/lxd/storage/lock.go b/lxd/storage/lock.go new file mode 100644 index 0000000000..92ffa49aed --- /dev/null +++ b/lxd/storage/lock.go @@ -0,0 +1,109 @@ +package storage + +import ( + "fmt" + "sync" + + "github.com/lxc/lxd/shared/logger" +) + +// lxdStorageLockMap is a hashmap that allows functions to check whether the +// operation they are about to perform is already in progress. If it is the +// channel can be used to wait for the operation to finish. If it is not, the +// function that wants to perform the operation should store its code in the +// hashmap. +// Note that any access to this map must be done while holding a lock. +var lxdStorageOngoingOperationMap = map[string]chan bool{} + +// lxdStorageMapLock is used to access lxdStorageOngoingOperationMap. +var lxdStorageMapLock sync.Mutex + +// The following functions are used to construct simple operation codes that are +// unique. +func getPoolMountLockID(poolName string) string { + return fmt.Sprintf("mount/pool/%s", poolName) +} + +func getPoolUmountLockID(poolName string) string { + return fmt.Sprintf("umount/pool/%s", poolName) +} + +func getContainerMountLockID(poolName string, containerName string) string { + return fmt.Sprintf("mount/container/%s/%s", poolName, containerName) +} + +func getContainerUmountLockID(poolName string, containerName string) string { + return fmt.Sprintf("umount/container/%s/%s", poolName, containerName) +} + +func getCustomMountLockID(poolName string, volumeName string) string { + return fmt.Sprintf("mount/custom/%s/%s", poolName, volumeName) +} + +func getCustomUmountLockID(poolName string, volumeName string) string { + return fmt.Sprintf("umount/custom/%s/%s", poolName, volumeName) +} + +func getImageCreateLockID(poolName string, fingerprint string) string { + return fmt.Sprintf("create/image/%s/%s", poolName, fingerprint) +} + +func LockPoolMount(poolName string) func() { + return lock(getPoolMountLockID(poolName)) +} + +func LockPoolUmount(poolName string) func() { + return lock(getPoolUmountLockID(poolName)) +} + +func LockContainerMount(poolName string, containerName string) func() { + return lock(getContainerMountLockID(poolName, containerName)) +} + +func LockContainerUmount(poolName string, containerName string) func() { + return lock(getContainerUmountLockID(poolName, containerName)) +} + +func LockCustomMount(poolName string, volumeName string) func() { + return lock(getCustomMountLockID(poolName, volumeName)) +} + +func LockCustomUmount(poolName string, volumeName string) func() { + return lock(getCustomUmountLockID(poolName, volumeName)) +} + +func LockImageCreate(poolName string, fingerprint string) func() { + return lock(getImageCreateLockID(poolName, fingerprint)) +} + +func lock(lockID string) func() { + lxdStorageMapLock.Lock() + + if waitChannel, ok := lxdStorageOngoingOperationMap[lockID]; ok { + lxdStorageMapLock.Unlock() + + _, ok := <-waitChannel + if ok { + logger.Warnf("Received value over semaphore, this should not have happened") + } + + // Give the benefit of the doubt and assume that the other + // thread actually succeeded in mounting the storage pool. + return nil + } + + lxdStorageOngoingOperationMap[lockID] = make(chan bool) + lxdStorageMapLock.Unlock() + + return func() { + lxdStorageMapLock.Lock() + + waitChannel, ok := lxdStorageOngoingOperationMap[lockID] + if ok { + close(waitChannel) + delete(lxdStorageOngoingOperationMap, lockID) + } + + lxdStorageMapLock.Unlock() + } +} diff --git a/lxd/storage/shared.go b/lxd/storage/shared.go new file mode 100644 index 0000000000..be5e57244b --- /dev/null +++ b/lxd/storage/shared.go @@ -0,0 +1,88 @@ +package storage + +import ( + "fmt" + "os" + "os/exec" + "syscall" + + "github.com/lxc/lxd/lxd/state" + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" +) + +type driverShared struct { + s *state.State + + poolID int64 + pool *api.StoragePool + + volume *api.StorageVolume + + sTypeVersion string +} + +func (d *driverShared) SharedInit(s *state.State, pool *api.StoragePool, poolID int64, volume *api.StorageVolume) { + d.s = s + d.pool = pool + d.poolID = poolID + d.volume = volume +} + +func (d *driverShared) GetVersion() string { + return d.sTypeVersion +} + +func (d *driverShared) rsync(source string, dest string) error { + var msg string + var err error + bwlimit := d.pool.Config["rsync.bwlimit"] + + errorMsg := fmt.Errorf("Failed to rsync: %s: %s", string(msg), err) + + err = os.MkdirAll(dest, 0755) + if err != nil { + return errorMsg + } + + rsyncVerbosity := "-q" + // Handle debug + /* + if debug { + rsyncVerbosity = "-vi" + } + */ + + if bwlimit == "" { + bwlimit = "0" + } + + msg, err = shared.RunCommand("rsync", + "-a", + "-HAX", + "--sparse", + "--devices", + "--delete", + "--checksum", + "--numeric-ids", + "--xattrs", + "--bwlimit", bwlimit, + rsyncVerbosity, + shared.AddSlash(source), + dest) + if err != nil { + runError, ok := err.(shared.RunError) + if ok { + exitError, ok := runError.Err.(*exec.ExitError) + if ok { + waitStatus := exitError.Sys().(syscall.WaitStatus) + if waitStatus.ExitStatus() == 24 { + return nil + } + } + } + return errorMsg + } + + return nil +} diff --git a/lxd/storage/storage.go b/lxd/storage/storage.go new file mode 100644 index 0000000000..a038fb75c5 --- /dev/null +++ b/lxd/storage/storage.go @@ -0,0 +1,181 @@ +package storage + +import ( + "fmt" + "os" + + "github.com/lxc/lxd/lxd/db" + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" + "github.com/lxc/lxd/shared/ioprogress" +) + +type StoragePoolArgs struct { + PoolID int64 + Pool *api.StoragePool + BackingFS string + Cluster *db.Cluster +} + +type StoragePoolVolumeArgs struct { + StoragePoolArgs + + Volume *api.StorageVolume +} + +// volumeType defines the type of a volume +type VolumeType int + +const ( + VolumeTypeContainer VolumeType = iota + VolumeTypeContainerSnapshot + VolumeTypeCustom + VolumeTypeCustomSnapshot + VolumeTypeImage + VolumeTypeImageSnapshot +) + +// {LXD_DIR}/storage-pools/ +func getStoragePoolMountPoint(poolName string) string { + return shared.VarPath("storage-pools", poolName) +} + +// ${LXD_DIR}/storage-pools//custom-snapshots// +func getStoragePoolVolumeSnapshotMountPoint(poolName string, snapshotName string) string { + return shared.VarPath("storage-pools", poolName, "custom-snapshots", snapshotName) +} + +// ${LXD_DIR}/storage-pools//custom/ +func getStoragePoolVolumeMountPoint(poolName string, volumeName string) string { + return shared.VarPath("storage-pools", poolName, "custom", volumeName) +} + +// ${LXD_DIR}/storage-pools//containers/[_] +func getContainerMountPoint(project string, poolName string, containerName string) string { + return shared.VarPath("storage-pools", poolName, "containers", projectPrefix(project, containerName)) +} + +// ${LXD_DIR}/storage-pools//containers-snapshots/ +func getSnapshotMountPoint(project, poolName string, snapshotName string) string { + return shared.VarPath("storage-pools", poolName, "containers-snapshots", projectPrefix(project, snapshotName)) +} + +func createContainerMountpoint(mountPoint string, mountPointSymlink string, privileged bool) error { + var mode os.FileMode + if privileged { + mode = 0700 + } else { + mode = 0711 + } + + mntPointSymlinkExist := shared.PathExists(mountPointSymlink) + mntPointSymlinkTargetExist := shared.PathExists(mountPoint) + + var err error + if !mntPointSymlinkTargetExist { + err = os.MkdirAll(mountPoint, 0711) + if err != nil { + return err + } + } + + err = os.Chmod(mountPoint, mode) + if err != nil { + return err + } + + if !mntPointSymlinkExist { + err := os.Symlink(mountPoint, mountPointSymlink) + if err != nil { + return err + } + } + + return nil +} + +func createSnapshotMountpoint(snapshotMountpoint string, snapshotsSymlinkTarget string, snapshotsSymlink string) error { + snapshotMntPointExists := shared.PathExists(snapshotMountpoint) + mntPointSymlinkExist := shared.PathExists(snapshotsSymlink) + + if !snapshotMntPointExists { + err := os.MkdirAll(snapshotMountpoint, 0711) + if err != nil { + return err + } + } + + if !mntPointSymlinkExist { + err := os.Symlink(snapshotsSymlinkTarget, snapshotsSymlink) + if err != nil { + return err + } + } + + return nil +} + +// ${LXD_DIR}/storage-pools//images/ +func getImageMountPoint(poolName string, fingerprint string) string { + return shared.VarPath("storage-pools", poolName, "images", fingerprint) +} + +// FIXME: this function doesn't belong here +// Add the "_" prefix when the given project name is not "default". +func projectPrefix(project string, s string) string { + if project != "default" { + s = fmt.Sprintf("%s_%s", project, s) + } + return s +} + +func renameContainerMountpoint(oldMountPoint string, oldMountPointSymlink string, newMountPoint string, newMountPointSymlink string) error { + if shared.PathExists(oldMountPoint) { + err := os.Rename(oldMountPoint, newMountPoint) + if err != nil { + return err + } + } + + // Rename the symlink target. + if shared.PathExists(oldMountPointSymlink) { + err := os.Remove(oldMountPointSymlink) + if err != nil { + return err + } + } + + // Create the new symlink. + err := os.Symlink(newMountPoint, newMountPointSymlink) + if err != nil { + return err + } + + return nil +} + +func unpackImage(imagefname string, destpath string, blockBackend bool, runningInUserns bool, tracker *ioprogress.ProgressTracker) error { + err := shared.Unpack(imagefname, destpath, blockBackend, runningInUserns, tracker) + if err != nil { + return err + } + + rootfsPath := fmt.Sprintf("%s/rootfs", destpath) + if shared.PathExists(imagefname + ".rootfs") { + err = os.MkdirAll(rootfsPath, 0755) + if err != nil { + return fmt.Errorf("Error creating rootfs directory") + } + + err = shared.Unpack(imagefname+".rootfs", rootfsPath, blockBackend, runningInUserns, tracker) + if err != nil { + return err + } + } + + if !shared.PathExists(rootfsPath) { + return fmt.Errorf("Image is missing a rootfs: %s", imagefname) + } + + return nil +} diff --git a/lxd/storage/storage_pools_config.go b/lxd/storage/storage_pools_config.go new file mode 100644 index 0000000000..4f2c830cff --- /dev/null +++ b/lxd/storage/storage_pools_config.go @@ -0,0 +1,40 @@ +package storage + +import "fmt" + +func updateStoragePoolError(unchangeable []string, driverName string) error { + return fmt.Errorf(`The %v properties cannot be changed for "%s" `+ + `storage pools`, unchangeable, driverName) +} + +var changeableStoragePoolProperties = map[string][]string{ + "btrfs": { + "rsync.bwlimit", + "btrfs.mount_options", + }, + + "ceph": { + "volume.block.filesystem", + "volume.block.mount_options", + "volume.size", + }, + + "dir": { + "rsync.bwlimit", + }, + + "lvm": { + "lvm.thinpool_name", + "lvm.vg_name", + "volume.block.filesystem", + "volume.block.mount_options", + "volume.size", + }, + + "zfs": { + "rsync_bwlimit", + "volume.zfs.remove_snapshots", + "volume.zfs.use_refquota", + "zfs.clone_copy", + }, +} diff --git a/lxd/storage/storage_pools_utils.go b/lxd/storage/storage_pools_utils.go new file mode 100644 index 0000000000..8a5238cc08 --- /dev/null +++ b/lxd/storage/storage_pools_utils.go @@ -0,0 +1,26 @@ +package storage + +import ( + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" +) + +func storageResource(path string) (*api.ResourcesStoragePool, error) { + st, err := shared.Statvfs(path) + if err != nil { + return nil, err + } + + res := api.ResourcesStoragePool{} + res.Space.Total = st.Blocks * uint64(st.Bsize) + res.Space.Used = (st.Blocks - st.Bfree) * uint64(st.Bsize) + + // Some filesystems don't report inodes since they allocate them + // dynamically e.g. btrfs. + if st.Files > 0 { + res.Inodes.Total = st.Files + res.Inodes.Used = st.Files - st.Ffree + } + + return &res, nil +} diff --git a/lxd/storage/volumes_utils.go b/lxd/storage/volumes_utils.go new file mode 100644 index 0000000000..162336e118 --- /dev/null +++ b/lxd/storage/volumes_utils.go @@ -0,0 +1,26 @@ +package storage + +import ( + "fmt" + + "github.com/lxc/lxd/lxd/db" +) + +// XXX: backward compatible declarations, introduced when the db code was +// extracted to its own package. We should eventually clean this up. +const ( + storagePoolVolumeTypeContainer = db.StoragePoolVolumeTypeContainer + storagePoolVolumeTypeImage = db.StoragePoolVolumeTypeImage + storagePoolVolumeTypeCustom = db.StoragePoolVolumeTypeCustom +) + +const ( + storagePoolVolumeTypeNameContainer = db.StoragePoolVolumeTypeNameContainer + storagePoolVolumeTypeNameImage = db.StoragePoolVolumeTypeNameImage + storagePoolVolumeTypeNameCustom = db.StoragePoolVolumeTypeNameCustom +) + +func updateStoragePoolVolumeError(unchangeable []string, driverName string) error { + return fmt.Errorf(`The %v properties cannot be changed for "%s" `+ + `storage volumes`, unchangeable, driverName) +} From 8fc5f849ca66f7f702448961183adef454cb911e Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 15:30:59 +0200 Subject: [PATCH 09/15] storage: Add btrfs Signed-off-by: Thomas Hipp --- lxd/storage/btrfs.go | 1416 ++++++++++++++++++++++++++++++++ lxd/storage_migration_btrfs.go | 403 +++++++++ 2 files changed, 1819 insertions(+) create mode 100644 lxd/storage/btrfs.go create mode 100644 lxd/storage_migration_btrfs.go diff --git a/lxd/storage/btrfs.go b/lxd/storage/btrfs.go new file mode 100644 index 0000000000..e8288bed63 --- /dev/null +++ b/lxd/storage/btrfs.go @@ -0,0 +1,1416 @@ +package storage + +import ( + "fmt" + "io/ioutil" + "os" + "os/exec" + "path" + "path/filepath" + "sort" + "strconv" + "strings" + "syscall" + + log "github.com/lxc/lxd/shared/log15" + + "github.com/lxc/lxd/lxd/db" + "github.com/lxc/lxd/lxd/util" + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" + "github.com/lxc/lxd/shared/logger" +) + +type Btrfs struct { + driverShared + + remount uintptr +} + +var btrfsVersion = "" + +func (s *Btrfs) Init() error { + if btrfsVersion != "" { + s.sTypeVersion = btrfsVersion + return nil + } + + out, err := exec.LookPath("btrfs") + if err != nil || len(out) == 0 { + return fmt.Errorf("The 'btrfs' tool isn't available") + } + + output, err := shared.RunCommand("btrfs", "version") + if err != nil { + return fmt.Errorf("The 'btrfs' tool isn't working properly") + } + + count, err := fmt.Sscanf(strings.SplitN(output, " ", 2)[1], "v%s\n", &s.sTypeVersion) + if err != nil || count != 1 { + return fmt.Errorf("The 'btrfs' tool isn't working properly") + } + + btrfsVersion = s.sTypeVersion + + return nil +} + +func (s *Btrfs) StoragePoolCheck() error { + // Nothing to do + return nil +} + +func (s *Btrfs) StoragePoolCreate() error { + isBlockDev := false + + source := s.pool.Config["source"] + + if strings.HasPrefix(source, "/") { + source = shared.HostPath(s.pool.Config["source"]) + } + + defaultSource := filepath.Join(shared.VarPath("disks"), fmt.Sprintf("%s.img", s.pool.Name)) + + if source == "" || source == defaultSource { + source = defaultSource + s.pool.Config["source"] = source + + f, err := os.Create(source) + if err != nil { + return fmt.Errorf("Failed to open %s: %s", source, err) + } + defer f.Close() + + err = f.Chmod(0600) + if err != nil { + return fmt.Errorf("Failed to chmod %s: %s", source, err) + } + + size, err := shared.ParseByteSizeString(s.pool.Config["size"]) + if err != nil { + return err + } + + err = f.Truncate(size) + if err != nil { + return fmt.Errorf("Failed to create sparse file %s: %s", source, err) + } + + output, err := makeFSType(source, "btrfs", &mkfsOptions{label: s.pool.Name}) + if err != nil { + return fmt.Errorf("Failed to create the BTRFS pool: %s", output) + } + } else { + // Unset size property since it doesn't make sense. + s.pool.Config["size"] = "" + + if filepath.IsAbs(source) { + isBlockDev = shared.IsBlockdevPath(source) + + if isBlockDev { + output, err := makeFSType(source, "btrfs", &mkfsOptions{label: s.pool.Name}) + if err != nil { + return fmt.Errorf("Failed to create the BTRFS pool: %s", output) + } + } else { + if IsBtrfsSubVolume(source) { + subvols, err := btrfsSubVolumesGet(source) + if err != nil { + return fmt.Errorf("Could not determine if existing BTRFS subvolume ist empty: %s", err) + } + + if len(subvols) > 0 { + return fmt.Errorf("Requested BTRFS subvolume exists but is not empty") + } + } else { + cleanSource := filepath.Clean(source) + lxdDir := shared.VarPath() + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + if shared.PathExists(source) && !isOnBtrfs(source) { + return fmt.Errorf("Existing path is neither a BTRFS subvolume nor does it reside on a BTRFS filesystem") + } else if strings.HasPrefix(cleanSource, lxdDir) { + if cleanSource != poolMntPoint { + return fmt.Errorf("BTRFS subvolumes requests in LXD directory \"%s\" are only valid under \"%s\"\n(e.g. source=%s)", shared.VarPath(), shared.VarPath("storage-pools"), poolMntPoint) + } else if s.s.OS.BackingFS != "btrfs" { + return fmt.Errorf("Creation of BTRFS subvolume requested but \"%s\" does not reside on BTRFS filesystem", source) + } + } + + err := BtrfsSubVolumeCreate(source) + if err != nil { + return err + } + } + } + } else { + return fmt.Errorf("Invalid \"source\" property") + } + } + + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + if !shared.PathExists(poolMntPoint) { + err := os.MkdirAll(poolMntPoint, storagePoolsDirMode) + if err != nil { + return err + } + } + + var err error + var devUUID string + + if isBlockDev && filepath.IsAbs(source) { + devUUID, _ = shared.LookupUUIDByBlockDevPath(source) + // The symlink might not have been created even with the delay + // we granted it above. So try to call btrfs filesystem show and + // parse it out. (I __hate__ this!) + if devUUID == "" { + devUUID, err = BtrfsLookupFsUUID(source) + if err != nil { + return err + } + } + s.pool.Config["source"] = devUUID + } + + _, err = s.StoragePoolMount() + if err != nil { + return err + } + + dirs := []string{ + getContainerMountPoint("default", s.pool.Name, ""), + getSnapshotMountPoint("default", s.pool.Name, ""), + getImageMountPoint(s.pool.Name, ""), + getStoragePoolVolumeMountPoint(s.pool.Name, ""), + getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, ""), + } + + for _, dir := range dirs { + err = BtrfsSubVolumeCreate(dir) + if err != nil { + return fmt.Errorf("Could not create btrfs subvolume: %s", dir) + } + } + + return nil +} + +func (s *Btrfs) StoragePoolDelete() error { + source := s.pool.Config["source"] + if strings.HasPrefix(source, "/") { + source = shared.HostPath(s.pool.Config["source"]) + } + + if source == "" { + return fmt.Errorf("no \"source\" property found for the storage pool") + } + + dirs := []string{ + getContainerMountPoint("default", s.pool.Name, ""), + getSnapshotMountPoint("default", s.pool.Name, ""), + getImageMountPoint(s.pool.Name, ""), + getStoragePoolVolumeMountPoint(s.pool.Name, ""), + } + + for _, dir := range dirs { + BtrfsSubVolumesDelete(dir) + } + + _, err := s.StoragePoolUmount() + if err != nil { + return err + } + + // This is a UUID. Check whether we can find the block device. + if !filepath.IsAbs(source) { + // Try to lookup the disk device by UUID but don't fail. If we + // don't find one this might just mean we have been given the + // UUID of a subvolume. + byUUID := fmt.Sprintf("/dev/disk/by-uuid/%s", source) + diskPath, err := os.Readlink(byUUID) + msg := "" + if err == nil { + msg = fmt.Sprintf("Removing disk device %s with UUID: %s.", diskPath, source) + } else { + msg = fmt.Sprintf("Failed to lookup disk device with UUID: %s: %s.", source, err) + } + logger.Debugf(msg) + } else { + var err error + cleanSource := filepath.Clean(source) + sourcePath := shared.VarPath("disks", s.pool.Name) + loopFilePath := sourcePath + ".img" + if cleanSource == loopFilePath { + // This is a loop file so simply remove it. + err = os.Remove(source) + } else { + if !isBtrfsFilesystem(source) && IsBtrfsSubVolume(source) { + err = BtrfsSubVolumesDelete(source) + } + } + if err != nil && !os.IsNotExist(err) { + return err + } + } + + // Remove the mountpoint for the storage pool. + err = os.RemoveAll(getStoragePoolMountPoint(s.pool.Name)) + if err != nil && !os.IsNotExist(err) { + return err + } + + return nil +} + +func (s *Btrfs) StoragePoolMount() (bool, error) { + cleanupFunc := LockPoolMount(s.pool.Name) + if cleanupFunc == nil { + return false, nil + } + defer cleanupFunc() + + source := s.pool.Config["source"] + if strings.HasPrefix(source, "/") { + source = shared.HostPath(s.pool.Config["source"]) + } + + if source == "" { + return false, fmt.Errorf("no \"source\" property found for the storage pool") + } + + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + // Check whether the mount poolMntPoint exits. + if !shared.PathExists(poolMntPoint) { + err := os.MkdirAll(poolMntPoint, storagePoolsDirMode) + if err != nil { + return false, err + } + } + + if shared.IsMountPoint(poolMntPoint) && (s.remount&syscall.MS_REMOUNT) == 0 { + return false, nil + } + + mountFlags, mountOptions := lxdResolveMountoptions(getBtrfsMountOptions(s.pool)) + mountSource := source + isBlockDev := shared.IsBlockdevPath(source) + if filepath.IsAbs(source) { + cleanSource := filepath.Clean(source) + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + loopFilePath := shared.VarPath("disks", s.pool.Name+".img") + if !isBlockDev && cleanSource == loopFilePath { + // If source == "${LXD_DIR}"/disks/{pool_name} it is a + // loop file we're dealing with. + // + // Since we mount the loop device LO_FLAGS_AUTOCLEAR is + // fine since the loop device will be kept around for as + // long as the mount exists. + loopF, loopErr := PrepareLoopDev(source, LoFlagsAutoclear) + if loopErr != nil { + return false, loopErr + } + mountSource = loopF.Name() + defer loopF.Close() + } else if !isBlockDev && cleanSource != poolMntPoint { + mountSource = source + mountFlags |= syscall.MS_BIND + } else if !isBlockDev && cleanSource == poolMntPoint && s.s.OS.BackingFS == "btrfs" { + return false, nil + } + // User is using block device path. + } else { + // Try to lookup the disk device by UUID but don't fail. If we + // don't find one this might just mean we have been given the + // UUID of a subvolume. + byUUID := fmt.Sprintf("/dev/disk/by-uuid/%s", source) + diskPath, err := os.Readlink(byUUID) + if err == nil { + mountSource = fmt.Sprintf("/dev/%s", strings.Trim(diskPath, "../../")) + } else { + // We have very likely been given a subvolume UUID. In + // this case we should simply assume that the user has + // mounted the parent of the subvolume or the subvolume + // itself. Otherwise this becomes a really messy + // detection task. + return false, nil + } + } + + mountFlags |= s.remount + err := syscall.Mount(mountSource, poolMntPoint, "btrfs", mountFlags, mountOptions) + if err != nil { + return false, err + } + + return true, nil +} + +func (s *Btrfs) StoragePoolUmount() (bool, error) { + cleanupFunc := LockPoolUmount(s.pool.Name) + if cleanupFunc == nil { + return false, nil + } + defer cleanupFunc() + + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + if shared.IsMountPoint(poolMntPoint) { + err := syscall.Unmount(poolMntPoint, 0) + if err != nil { + return false, err + } + } + + return true, nil +} + +func (s *Btrfs) StoragePoolResources() (*api.ResourcesStoragePool, error) { + ourMount, err := s.StoragePoolMount() + if err != nil { + return nil, err + } + if ourMount { + defer s.StoragePoolUmount() + } + + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + // Inode allocation is dynamic so no use in reporting them. + + return storageResource(poolMntPoint) +} + +func (s *Btrfs) StoragePoolUpdate(writable *api.StoragePoolPut, + changedConfig []string) error { + changeable := changeableStoragePoolProperties["btrfs"] + unchangeable := []string{} + for _, change := range changedConfig { + if !shared.StringInSlice(change, changeable) { + unchangeable = append(unchangeable, change) + } + } + + if len(unchangeable) > 0 { + return updateStoragePoolError(unchangeable, "btrfs") + } + + // "rsync.bwlimit" requires no on-disk modifications. + + if shared.StringInSlice("btrfs.mount_options", changedConfig) { + setBtrfsMountOptions(s.pool, writable.Config["btrfs.mount_options"]) + s.remount |= syscall.MS_REMOUNT + _, err := s.StoragePoolMount() + if err != nil { + return err + } + } + + return nil +} + +func (s *Btrfs) VolumeCreate(project string, volumeName string, + volumeType VolumeType) error { + logger.Debug("Creating volume", log.Ctx{"project": project, "volume": volumeName}) + var mountPoint string + + switch volumeType { + case VolumeTypeContainer: + mountPoint = getContainerMountPoint(project, s.pool.Name, volumeName) + case VolumeTypeCustom: + mountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, volumeName) + case VolumeTypeImage: + mountPoint = getImageMountPoint(s.pool.Name, volumeName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + if IsBtrfsSubVolume(mountPoint) { + return nil + } + + err := BtrfsSubVolumeCreate(mountPoint) + if err != nil { + return err + } + + switch volumeType { + case VolumeTypeCustom: + // apply quota + if s.volume.Config["size"] != "" { + size, err := shared.ParseByteSizeString(s.volume.Config["size"]) + if err != nil { + return err + } + + err = s.VolumeSetQuota(project, volumeName, size, false, VolumeTypeCustom) + if err != nil { + return err + } + } + } + + return nil +} + +func (s *Btrfs) VolumeCopy(project, source, target string, snapshots []string, volumeType VolumeType) error { + var recursive bool + var sourcePath string + var targetPath string + + switch volumeType { + case VolumeTypeContainer: + recursive = true + sourcePath = getContainerMountPoint(project, s.pool.Name, source) + targetPath = getContainerMountPoint(project, s.pool.Name, target) + case VolumeTypeCustom: + recursive = true + sourcePath = getStoragePoolVolumeMountPoint(s.pool.Name, source) + targetPath = getStoragePoolVolumeMountPoint(s.pool.Name, target) + case VolumeTypeImage: + recursive = false + sourcePath = getImageMountPoint(s.pool.Name, source) + targetPath = getContainerMountPoint(project, s.pool.Name, target) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + err := BtrfsPoolVolumesSnapshot(sourcePath, targetPath, false, recursive) + if err != nil { + return err + } + + for _, snap := range snapshots { + sourceSnapshotName := fmt.Sprintf("%s/%s", source, snap) + targetSnapshotName := fmt.Sprintf("%s/%s", target, snap) + + var sourceSnapshotPath string + var targetSnapshotPath string + + switch volumeType { + case VolumeTypeContainer: + sourceSnapshotPath = getSnapshotMountPoint(project, s.pool.Name, sourceSnapshotName) + targetSnapshotPath = getSnapshotMountPoint(project, s.pool.Name, targetSnapshotName) + case VolumeTypeCustom: + sourceSnapshotPath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, sourceSnapshotName) + targetSnapshotPath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, targetSnapshotName) + } + + err := BtrfsPoolVolumesSnapshot(sourceSnapshotPath, targetSnapshotPath, false, true) + if err != nil { + return err + } + } + + return nil +} + +func (s *Btrfs) VolumeDelete(project, volumeName string, recursive bool, volumeType VolumeType) error { + var volumePath string + + switch volumeType { + case VolumeTypeContainer: + volumePath = getContainerMountPoint(project, s.pool.Name, volumeName) + case VolumeTypeContainerSnapshot: + volumePath = getSnapshotMountPoint(project, s.pool.Name, volumeName) + case VolumeTypeCustom: + volumePath = getStoragePoolVolumeMountPoint(s.pool.Name, volumeName) + case VolumeTypeCustomSnapshot: + volumePath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, volumeName) + case VolumeTypeImage: + volumePath = getImageMountPoint(s.pool.Name, volumeName) + default: + return fmt.Errorf("Unsupported volume type") + } + + if recursive { + if shared.PathExists(volumePath) && IsBtrfsSubVolume(volumePath) { + return BtrfsSubVolumesDelete(volumePath) + } + } else { + return BtrfsSubVolumeDelete(volumePath) + } + + return nil +} + +func (s *Btrfs) VolumeMount(project string, name string, volumeType VolumeType) (bool, error) { + if volumeType != VolumeTypeContainerSnapshot { + // Nothing to do + return s.StoragePoolMount() + } + + snapshotSubvolumeName := getSnapshotMountPoint(project, s.pool.Name, name) + roSnapshotSubvolumeName := fmt.Sprintf("%s.ro", snapshotSubvolumeName) + + if shared.PathExists(roSnapshotSubvolumeName) { + logger.Debugf("The BTRFS snapshot is already mounted read-write") + return false, nil + } + + err := os.Rename(snapshotSubvolumeName, roSnapshotSubvolumeName) + if err != nil { + return false, err + } + + err = BtrfsPoolVolumesSnapshot(roSnapshotSubvolumeName, snapshotSubvolumeName, false, true) + if err != nil { + return false, err + } + + return true, nil +} + +func (s *Btrfs) VolumeUmount(project string, name string, volumeType VolumeType) (bool, error) { + if volumeType != VolumeTypeContainerSnapshot { + // Nothing to do + return true, nil + } + + snapshotSubvolumeName := getSnapshotMountPoint(project, s.pool.Name, name) + roSnapshotSubvolumeName := fmt.Sprintf("%s.ro", snapshotSubvolumeName) + + if !shared.PathExists(roSnapshotSubvolumeName) { + logger.Debugf("The BTRFS snapshot is currently not mounted read-write") + return false, nil + } + + if shared.PathExists(snapshotSubvolumeName) && IsBtrfsSubVolume(snapshotSubvolumeName) { + err := BtrfsSubVolumesDelete(snapshotSubvolumeName) + if err != nil { + return false, err + } + } + + err := os.Rename(roSnapshotSubvolumeName, snapshotSubvolumeName) + if err != nil { + return false, err + } + + return true, nil +} + +func (s *Btrfs) VolumeGetUsage(project, name string, path string) (int64, error) { + return btrfsPoolVolumeQGroupUsage(path) +} + +func (s *Btrfs) VolumeSetQuota(project, name string, size int64, userns bool, volumeType VolumeType) error { + var path string + + switch volumeType { + case VolumeTypeContainer: + path = getContainerMountPoint(project, s.pool.Name, name) + case VolumeTypeCustom: + path = getStoragePoolVolumeMountPoint(s.pool.Name, name) + } + + qgroup, err := btrfsSubVolumeQGroup(path) + if err != nil { + if err != db.ErrNoSuchObject { + return err + } + + // Enable quotas + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + output, err := shared.RunCommand("btrfs", "quota", "enable", poolMntPoint) + if err != nil && !userns { + return fmt.Errorf("Failed to enable quotas on BTRFS pool: %s", output) + } + } + + // Attempt to make the subvolume writable + shared.RunCommand("btrfs", "property", "set", path, "ro", "false") + if size > 0 { + output, err := shared.RunCommand( + "btrfs", + "qgroup", + "limit", + "-e", fmt.Sprintf("%d", size), + path) + + if err != nil { + return fmt.Errorf("Failed to set btrfs quota: %s", output) + } + } else if qgroup != "" { + output, err := shared.RunCommand( + "btrfs", + "qgroup", + "destroy", + qgroup, + path) + + if err != nil { + return fmt.Errorf("Failed to set btrfs quota: %s", output) + } + } + + return nil +} + +func (s *Btrfs) VolumeRename(project string, oldName string, newName string, snapshots []string, + volumeType VolumeType) error { + switch volumeType { + case VolumeTypeContainer: + case VolumeTypeCustom: + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + return nil +} + +func (s *Btrfs) VolumeRestore(project string, sourceName string, targetName string, volumeType VolumeType) error { + return s.VolumeSnapshotRestore(project, sourceName, targetName, volumeType) +} + +func (s *Btrfs) VolumeUpdate(writable *api.StorageVolumePut, + changedConfig []string) error { + // Nothing to do + return nil +} + +func (s *Btrfs) VolumeSnapshotCreate(project, source, target string, + volumeType VolumeType) error { + var sourcePath string + var targetPath string + + switch volumeType { + case VolumeTypeContainerSnapshot: + sourcePath = getContainerMountPoint(project, s.pool.Name, source) + targetPath = getSnapshotMountPoint(project, s.pool.Name, target) + case VolumeTypeCustomSnapshot: + sourcePath = getStoragePoolVolumeMountPoint(s.pool.Name, source) + targetPath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target) + case VolumeTypeImageSnapshot: + sourcePath = getImageMountPoint(s.pool.Name, source) + targetPath = getImageMountPoint(s.pool.Name, target) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + if source == "" { + // Create empty snapshot + return BtrfsSubVolumeCreate(targetPath) + } + + return BtrfsPoolVolumesSnapshot(sourcePath, targetPath, true, true) +} + +func (s *Btrfs) VolumeSnapshotCopy(project, source, target string, + volumeType VolumeType) error { + var readOnly bool + var recursive bool + var sourcePath string + var targetPath string + + switch volumeType { + case VolumeTypeContainerSnapshot: + readOnly = false + recursive = true + sourcePath = getSnapshotMountPoint(project, s.pool.Name, source) + + if shared.IsSnapshot(target) { + targetPath = getSnapshotMountPoint(project, s.pool.Name, target) + } else { + targetPath = getContainerMountPoint(project, s.pool.Name, target) + } + case VolumeTypeCustomSnapshot: + readOnly = false + recursive = true + sourcePath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, source) + + if shared.IsSnapshot(target) { + targetPath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target) + } else { + targetPath = getStoragePoolVolumeMountPoint(s.pool.Name, target) + } + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + return BtrfsPoolVolumesSnapshot(sourcePath, targetPath, readOnly, recursive) +} + +func (s *Btrfs) VolumeSnapshotDelete(project string, volumeName string, recursive bool, volumeType VolumeType) error { + return s.VolumeDelete(project, volumeName, recursive, volumeType) +} + +func (s *Btrfs) VolumeSnapshotRestore(project string, sourceName string, targetName string, volumeType VolumeType) error { + var sourceMntPoint string + var targetMntPoint string + + logger.Debug("Restoring snapshot", log.Ctx{"project": project, "source": sourceName, "target": targetName}) + + switch volumeType { + case VolumeTypeContainerSnapshot: + sourceMntPoint = getSnapshotMountPoint(project, s.pool.Name, sourceName) + targetMntPoint = getContainerMountPoint(project, s.pool.Name, targetName) + case VolumeTypeCustomSnapshot: + sourceMntPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, sourceName) + targetMntPoint = getStoragePoolVolumeMountPoint(s.pool.Name, targetName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + backupTargetMntPoint := fmt.Sprintf("%s.tmp", targetMntPoint) + + err := os.Rename(targetMntPoint, backupTargetMntPoint) + if err != nil { + return err + } + + undo := true + + defer func() { + if undo { + os.Rename(backupTargetMntPoint, targetMntPoint) + } + }() + + err = BtrfsPoolVolumesSnapshot(sourceMntPoint, targetMntPoint, false, true) + if err != nil { + return err + } + + undo = false + + return BtrfsSubVolumesDelete(backupTargetMntPoint) +} + +func (s *Btrfs) VolumeSnapshotRename(project string, oldName string, newName string, volumeType VolumeType) error { + switch volumeType { + case VolumeTypeContainerSnapshot: + // Nothing to do + case VolumeTypeCustomSnapshot: + // Nothing to do + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + return nil +} + +func (s *Btrfs) VolumeReady(project string, name string) bool { + containerMntPoint := getContainerMountPoint(project, s.pool.Name, name) + return IsBtrfsSubVolume(containerMntPoint) +} + +func (s *Btrfs) doVolumeBackupCreateOptimized(path string, project string, source string, + snapshots []string) error { + // Handle snapshots + finalParent := "" + + if len(snapshots) > 0 { + snapshotsPath := fmt.Sprintf("%s/snapshots", path) + + // Create the snapshot path + err := os.MkdirAll(snapshotsPath, 0711) + if err != nil { + return err + } + + for i, snap := range snapshots { + // Figure out previous and current subvolumes + prev := "" + fullSnapshotName := fmt.Sprintf("%s/%s", source, snap) + + if i > 0 { + fullPrevSnapshotName := fmt.Sprintf("%s/%s", source, snapshots[i-1]) + // /var/lib/lxd/storage-pools//containers-snapshots// + prev = getSnapshotMountPoint(project, s.pool.Name, fullPrevSnapshotName) + } + + cur := getSnapshotMountPoint(project, s.pool.Name, fullSnapshotName) + // Make a binary btrfs backup + target := fmt.Sprintf("%s/%s.bin", snapshotsPath, snap) + + err := btrfsBackup(cur, prev, target) + if err != nil { + return err + } + + finalParent = cur + } + } + + // Make a temporary copy of the container + sourceVolume := getContainerMountPoint(project, s.pool.Name, source) + containersPath := getContainerMountPoint("default", s.pool.Name, "") + + tmpContainerMntPoint, err := ioutil.TempDir(containersPath, source) + if err != nil { + return err + } + defer os.RemoveAll(tmpContainerMntPoint) + + err = os.Chmod(tmpContainerMntPoint, 0700) + if err != nil { + return err + } + + targetVolume := fmt.Sprintf("%s/.backup", tmpContainerMntPoint) + err = BtrfsPoolVolumesSnapshot(sourceVolume, targetVolume, true, true) + if err != nil { + return err + } + defer BtrfsSubVolumesDelete(targetVolume) + + // Dump the container to a file + fsDump := fmt.Sprintf("%s/container.bin", path) + + err = btrfsBackup(targetVolume, finalParent, fsDump) + if err != nil { + return err + } + + return nil +} + +func (s *Btrfs) doVolumeBackupCreate(path string, project string, source string, + snapshots []string) error { + // Handle snapshots + if len(snapshots) > 0 { + snapshotsPath := fmt.Sprintf("%s/snapshots", path) + + // Create the snapshot path + err := os.MkdirAll(snapshotsPath, 0711) + if err != nil { + return err + } + + for _, snap := range snapshots { + fullSnapshotName := fmt.Sprintf("%s/%s", source, snap) + + // Mount the snapshot to a usable path + _, err := s.VolumeMount(project, fullSnapshotName, VolumeTypeContainerSnapshot) + if err != nil { + return err + } + + snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, fullSnapshotName) + target := fmt.Sprintf("%s/%s", snapshotsPath, snap) + + // Copy the snapshot + err = s.rsync(snapshotMntPoint, target) + s.VolumeUmount(project, fullSnapshotName, VolumeTypeContainerSnapshot) + if err != nil { + return err + } + } + } + + // Make a temporary copy of the container + sourceVolume := getContainerMountPoint(project, s.pool.Name, source) + containersPath := getContainerMountPoint("default", s.pool.Name, "") + + tmpContainerMntPoint, err := ioutil.TempDir(containersPath, source) + if err != nil { + return err + } + defer os.RemoveAll(tmpContainerMntPoint) + + err = os.Chmod(tmpContainerMntPoint, 0700) + if err != nil { + return err + } + + targetVolume := fmt.Sprintf("%s/.backup", tmpContainerMntPoint) + + err = BtrfsPoolVolumesSnapshot(sourceVolume, targetVolume, true, true) + if err != nil { + return err + } + defer BtrfsSubVolumesDelete(targetVolume) + + // Copy the container + containerPath := fmt.Sprintf("%s/container", path) + + err = s.rsync(targetVolume, containerPath) + if err != nil { + return err + } + + return nil +} + +func (s *Btrfs) VolumeBackupCreate(path string, project string, source string, + snapshots []string, optimized bool) error { + if optimized { + return s.doVolumeBackupCreateOptimized(path, project, source, snapshots) + } + + return s.doVolumeBackupCreate(path, project, source, snapshots) +} + +func (s *Btrfs) doVolumeBackupLoadOptimized(backupDir string, project string, + containerName string, snapshots []string, privileged bool) error { + unpackDir := backupDir + unpackPath := filepath.Join(unpackDir, ".backup_unpack") + + for _, snapshotOnlyName := range snapshots { + snapshotBackup := fmt.Sprintf("%s/snapshots/%s.bin", unpackPath, snapshotOnlyName) + + feeder, err := os.Open(snapshotBackup) + if err != nil { + return err + } + + // create mountpoint + snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, containerName) + snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(project, containerName)) + snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(project, containerName)) + + err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) + if err != nil { + feeder.Close() + return err + } + + // /var/lib/lxd/storage-pools//snapshots// + btrfsRecvCmd := exec.Command("btrfs", "receive", "-e", snapshotMntPoint) + btrfsRecvCmd.Stdin = feeder + + msg, err := btrfsRecvCmd.CombinedOutput() + feeder.Close() + if err != nil { + logger.Errorf("Failed to receive contents of btrfs backup \"%s\": %s", snapshotBackup, string(msg)) + return err + } + } + containerBackupFile := fmt.Sprintf("%s/container.bin", unpackPath) + + feeder, err := os.Open(containerBackupFile) + if err != nil { + return err + } + defer feeder.Close() + + btrfsRecvCmd := exec.Command("btrfs", "receive", "-vv", "-e", unpackDir) + btrfsRecvCmd.Stdin = feeder + + msg, err := btrfsRecvCmd.CombinedOutput() + if err != nil { + logger.Errorf("Failed to receive contents of btrfs backup \"%s\": %s", containerBackupFile, string(msg)) + return err + } + + tmpContainerMntPoint := fmt.Sprintf("%s/.backup", unpackDir) + defer BtrfsSubVolumesDelete(tmpContainerMntPoint) + + containerMntPoint := getContainerMountPoint(project, s.pool.Name, containerName) + err = BtrfsPoolVolumesSnapshot(tmpContainerMntPoint, containerMntPoint, false, true) + if err != nil { + logger.Errorf("Failed to create btrfs snapshot \"%s\" of \"%s\": %s", tmpContainerMntPoint, containerMntPoint, err) + return err + } + + // Create mountpoints + err = createContainerMountpoint(containerMntPoint, shared.VarPath("containers", projectPrefix(project, containerName)), privileged) + if err != nil { + return err + } + + return nil +} + +func (s *Btrfs) doVolumeBackupLoad(backupDir string, project string, + containerName string, snapshots []string, privileged bool) error { + return nil +} + +func (s *Btrfs) VolumeBackupLoad(backupDir string, project string, + containerName string, snapshots []string, privileged bool, optimized bool) error { + logger.Debug("Loading volume backup", log.Ctx{"project": project, "name": containerName, "snapshots": len(snapshots)}) + + if optimized { + return s.doVolumeBackupLoadOptimized(backupDir, project, containerName, snapshots, privileged) + } + + return s.doVolumeBackupLoad(backupDir, project, containerName, snapshots, privileged) +} + +func (s *Btrfs) VolumePrepareRestore(sourceName string, targetName string, targetSnapshots []string, f func() error) error { + // Nothing to do + return nil +} + +func btrfsBackup(cur string, prev string, target string) error { + args := []string{"send"} + if prev != "" { + args = append(args, "-p", prev) + } + args = append(args, cur) + + eater, err := os.OpenFile(target, os.O_RDWR|os.O_CREATE, 0644) + if err != nil { + return err + } + defer eater.Close() + + btrfsSendCmd := exec.Command("btrfs", args...) + btrfsSendCmd.Stdout = eater + + err = btrfsSendCmd.Run() + if err != nil { + return err + } + + return err +} + +func getBtrfsMountOptions(pool *api.StoragePool) string { + if pool.Config["btrfs.mount_options"] != "" { + return pool.Config["btrfs.mount_options"] + } + + return "user_subvol_rm_allowed" +} + +func setBtrfsMountOptions(pool *api.StoragePool, mountOptions string) { + pool.Config["btrfs.mount_options"] = mountOptions +} + +// btrfsPoolVolumesDelete is the recursive variant on btrfsPoolVolumeDelete, +// it first deletes subvolumes of the subvolume and then the +// subvolume itself. +func BtrfsSubVolumesDelete(subvol string) error { + // Delete subsubvols. + subsubvols, err := btrfsSubVolumesGet(subvol) + if err != nil { + return err + } + sort.Sort(sort.Reverse(sort.StringSlice(subsubvols))) + + for _, subsubvol := range subsubvols { + err := BtrfsSubVolumeDelete(path.Join(subvol, subsubvol)) + if err != nil { + return err + } + } + + // Delete the subvol itself + err = BtrfsSubVolumeDelete(subvol) + if err != nil { + return err + } + + return nil +} + +func BtrfsSubVolumeCreate(subvol string) error { + parentDestPath := filepath.Dir(subvol) + + // TODO: remove this, and create parent directory in the Storage functions. + // Then, we will also be able to use *DirMode for the permissions. + if !shared.PathExists(parentDestPath) { + err := os.MkdirAll(parentDestPath, 0711) + if err != nil { + return err + } + } + + output, err := shared.RunCommand("btrfs", "subvolume", "create", subvol) + if err != nil { + logger.Errorf("Failed to create BTRFS subvolume \"%s\": %s", subvol, output) + return err + } + + return nil +} + +func BtrfsSubVolumeDelete(subvol string) error { + // Attempt (but don't fail on) to delete any qgroup on the subvolume + qgroup, err := btrfsSubVolumeQGroup(subvol) + if err == nil { + shared.RunCommand( + "btrfs", + "qgroup", + "destroy", + qgroup, + subvol) + } + + // Attempt to make the subvolume writable + shared.RunCommand("btrfs", "property", "set", subvol, "ro", "false") + + // Delete the subvolume itself + _, err = shared.RunCommand( + "btrfs", + "subvolume", + "delete", + subvol) + + return err +} + +func BtrfsPoolVolumesSnapshot(source string, dest string, readonly bool, recursive bool) error { + // Now snapshot all subvolumes of the root. + if recursive { + // Get a list of subvolumes of the root + subsubvols, err := btrfsSubVolumesGet(source) + if err != nil { + return err + } + sort.Strings(subsubvols) + + if len(subsubvols) > 0 && readonly { + // A root with subvolumes can never be readonly, + // also don't make subvolumes readonly. + readonly = false + + logger.Warnf("Subvolumes detected, ignoring ro flag") + } + + // First snapshot the root + err = BtrfsSnapshot(source, dest, readonly) + if err != nil { + return err + } + + for _, subsubvol := range subsubvols { + // Clear the target for the subvol to use + os.Remove(path.Join(dest, subsubvol)) + + err := BtrfsSnapshot(path.Join(source, subsubvol), path.Join(dest, subsubvol), readonly) + if err != nil { + return err + } + } + } else { + err := BtrfsSnapshot(source, dest, readonly) + if err != nil { + return err + } + } + + return nil +} + +func btrfsSubVolumesGet(path string) ([]string, error) { + result := []string{} + + if !strings.HasSuffix(path, "/") { + path = path + "/" + } + + // Unprivileged users can't get to fs internals + filepath.Walk(path, func(fpath string, fi os.FileInfo, err error) error { + // Skip walk errors + if err != nil { + return nil + } + + // Ignore the base path + if strings.TrimRight(fpath, "/") == strings.TrimRight(path, "/") { + return nil + } + + // Subvolumes can only be directories + if !fi.IsDir() { + return nil + } + + // Check if a btrfs subvolume + if IsBtrfsSubVolume(fpath) { + result = append(result, strings.TrimPrefix(fpath, path)) + } + + return nil + }) + + return result, nil +} + +/* + * BtrfsSnapshot creates a snapshot of "source" to "dest" + * the result will be readonly if "readonly" is True. + */ +func BtrfsSnapshot(source string, dest string, readonly bool) error { + logger.Debug("Creating btrfs snapshot", log.Ctx{"source": source, "destination": dest, "readonly": readonly}) + var output string + var err error + if readonly { + output, err = shared.RunCommand( + "btrfs", + "subvolume", + "snapshot", + "-r", + source, + dest) + } else { + output, err = shared.RunCommand( + "btrfs", + "subvolume", + "snapshot", + source, + dest) + } + if err != nil { + return fmt.Errorf( + "subvolume snapshot failed, source=%s, dest=%s, output=%s", + source, + dest, + output, + ) + } + + return err +} + +// IsBtrfsSubVolume returns true if the given Path is a btrfs subvolume else +// false. +func IsBtrfsSubVolume(subvolPath string) bool { + fs := syscall.Stat_t{} + err := syscall.Lstat(subvolPath, &fs) + if err != nil { + return false + } + + // Check if BTRFS_FIRST_FREE_OBJECTID + if fs.Ino != 256 { + return false + } + + return true +} + +func btrfsSubVolumeQGroup(subvol string) (string, error) { + output, err := shared.RunCommand( + "btrfs", + "qgroup", + "show", + subvol, + "-e", + "-f") + + if err != nil { + return "", db.ErrNoSuchObject + } + + var qgroup string + for _, line := range strings.Split(output, "\n") { + if line == "" || strings.HasPrefix(line, "qgroupid") || strings.HasPrefix(line, "---") { + continue + } + + fields := strings.Fields(line) + if len(fields) != 4 { + continue + } + + qgroup = fields[0] + } + + if qgroup == "" { + return "", fmt.Errorf("Unable to find quota group") + } + + return qgroup, nil +} + +func btrfsPoolVolumeQGroupUsage(subvol string) (int64, error) { + output, err := shared.RunCommand( + "btrfs", + "qgroup", + "show", + subvol, + "-e", + "-f") + + if err != nil { + return -1, fmt.Errorf("BTRFS quotas not supported. Try enabling them with \"btrfs quota enable\"") + } + + for _, line := range strings.Split(output, "\n") { + if line == "" || strings.HasPrefix(line, "qgroupid") || strings.HasPrefix(line, "---") { + continue + } + + fields := strings.Fields(line) + if len(fields) != 4 { + continue + } + + usage, err := strconv.ParseInt(fields[2], 10, 64) + if err != nil { + continue + } + + return usage, nil + } + + return -1, fmt.Errorf("Unable to find current qgroup usage") +} + +func isOnBtrfs(path string) bool { + fs := syscall.Statfs_t{} + + err := syscall.Statfs(path, &fs) + if err != nil { + return false + } + + if fs.Type != util.FilesystemSuperMagicBtrfs { + return false + } + + return true +} + +func BtrfsLookupFsUUID(fs string) (string, error) { + output, err := shared.RunCommand( + "btrfs", + "filesystem", + "show", + "--raw", + fs) + if err != nil { + return "", fmt.Errorf("failed to detect UUID") + } + + outputString := output + idx := strings.Index(outputString, "uuid: ") + outputString = outputString[idx+6:] + outputString = strings.TrimSpace(outputString) + idx = strings.Index(outputString, "\t") + outputString = outputString[:idx] + outputString = strings.Trim(outputString, "\n") + + return outputString, nil +} + +func isBtrfsFilesystem(path string) bool { + _, err := shared.RunCommand("btrfs", "filesystem", "show", path) + if err != nil { + return false + } + + return true +} + +func BtrfsSnapshotDeleteInternal(project, poolName string, snapshotName string) error { + snapshotSubvolumeName := getSnapshotMountPoint(project, poolName, snapshotName) + if shared.PathExists(snapshotSubvolumeName) && IsBtrfsSubVolume(snapshotSubvolumeName) { + err := BtrfsSubVolumesDelete(snapshotSubvolumeName) + if err != nil { + return err + } + } + + sourceSnapshotMntPoint := shared.VarPath("snapshots", projectPrefix(project, snapshotName)) + os.Remove(sourceSnapshotMntPoint) + os.Remove(snapshotSubvolumeName) + + sourceName, _, _ := containerGetParentAndSnapshotName(snapshotName) + snapshotSubvolumePath := getSnapshotMountPoint(project, poolName, sourceName) + os.Remove(snapshotSubvolumePath) + if !shared.PathExists(snapshotSubvolumePath) { + snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(project, sourceName)) + os.Remove(snapshotMntPointSymlink) + } + + return nil +} diff --git a/lxd/storage_migration_btrfs.go b/lxd/storage_migration_btrfs.go new file mode 100644 index 0000000000..9b8a9064ca --- /dev/null +++ b/lxd/storage_migration_btrfs.go @@ -0,0 +1,403 @@ +package main + +import ( + "fmt" + "io" + "io/ioutil" + "os" + "os/exec" + + "github.com/gorilla/websocket" + driver "github.com/lxc/lxd/lxd/storage" + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" + "github.com/lxc/lxd/shared/logger" +) + +type btrfsMigrationSourceDriver2 struct { + container container + snapshots []container + btrfsSnapshotNames []string + pool *api.StoragePool + runningSnapName string + stoppedSnapName string +} + +func btrfsMigrationSource(args MigrationSourceArgs, pool *api.StoragePool) (MigrationStorageSourceDriver, error) { + /* List all the snapshots in order of reverse creation. The idea here + * is that we send the oldest to newest snapshot, hopefully saving on + * xfer costs. Then, after all that, we send the container itself. + */ + var err error + var snapshots = []container{} + if !args.ContainerOnly { + snapshots, err = args.Container.Snapshots() + if err != nil { + return nil, err + } + } + + driver := &btrfsMigrationSourceDriver2{ + container: args.Container, + snapshots: snapshots, + btrfsSnapshotNames: []string{}, + pool: pool, + } + + if !args.ContainerOnly { + for _, snap := range snapshots { + btrfsPath := getSnapshotMountPoint(snap.Project(), pool.Name, snap.Name()) + driver.btrfsSnapshotNames = append(driver.btrfsSnapshotNames, btrfsPath) + } + } + + return driver, nil +} + +func (s *btrfsMigrationSourceDriver2) send(conn *websocket.Conn, btrfsPath string, btrfsParent string, readWrapper func(io.ReadCloser) io.ReadCloser) error { + args := []string{"send"} + if btrfsParent != "" { + args = append(args, "-p", btrfsParent) + } + args = append(args, btrfsPath) + + cmd := exec.Command("btrfs", args...) + + stdout, err := cmd.StdoutPipe() + if err != nil { + return err + } + + readPipe := io.ReadCloser(stdout) + if readWrapper != nil { + readPipe = readWrapper(stdout) + } + + stderr, err := cmd.StderrPipe() + if err != nil { + return err + } + + err = cmd.Start() + if err != nil { + return err + } + + <-shared.WebsocketSendStream(conn, readPipe, 4*1024*1024) + + output, err := ioutil.ReadAll(stderr) + if err != nil { + logger.Errorf("Problem reading btrfs send stderr: %s", err) + } + + err = cmd.Wait() + if err != nil { + logger.Errorf("Problem with btrfs send: %s", string(output)) + } + + return err +} + +func (s *btrfsMigrationSourceDriver2) SendWhileRunning(conn *websocket.Conn, op *operation, bwlimit string, containerOnly bool) error { + _, containerPool, _ := s.container.Storage().GetContainerPoolInfo() + containerName := s.container.Name() + containersPath := getContainerMountPoint("default", containerPool, "") + sourceName := containerName + + // Deal with sending a snapshot to create a container on another LXD + // instance. + if s.container.IsSnapshot() { + sourceName, _, _ := containerGetParentAndSnapshotName(containerName) + snapshotsPath := getSnapshotMountPoint(s.container.Project(), containerPool, sourceName) + tmpContainerMntPoint, err := ioutil.TempDir(snapshotsPath, sourceName) + if err != nil { + return err + } + defer os.RemoveAll(tmpContainerMntPoint) + + err = os.Chmod(tmpContainerMntPoint, 0700) + if err != nil { + return err + } + + migrationSendSnapshot := fmt.Sprintf("%s/.migration-send", tmpContainerMntPoint) + snapshotMntPoint := getSnapshotMountPoint(s.container.Project(), containerPool, containerName) + err = driver.BtrfsPoolVolumesSnapshot(snapshotMntPoint, migrationSendSnapshot, true, true) + if err != nil { + return err + } + defer driver.BtrfsSubVolumesDelete(migrationSendSnapshot) + + wrapper := StorageProgressReader(op, "fs_progress", containerName) + return s.send(conn, migrationSendSnapshot, "", wrapper) + } + + if !containerOnly { + for i, snap := range s.snapshots { + prev := "" + if i > 0 { + prev = getSnapshotMountPoint(snap.Project(), containerPool, s.snapshots[i-1].Name()) + } + + snapMntPoint := getSnapshotMountPoint(snap.Project(), containerPool, snap.Name()) + wrapper := StorageProgressReader(op, "fs_progress", snap.Name()) + if err := s.send(conn, snapMntPoint, prev, wrapper); err != nil { + return err + } + } + } + + tmpContainerMntPoint, err := ioutil.TempDir(containersPath, containerName) + if err != nil { + return err + } + defer os.RemoveAll(tmpContainerMntPoint) + + err = os.Chmod(tmpContainerMntPoint, 0700) + if err != nil { + return err + } + + migrationSendSnapshot := fmt.Sprintf("%s/.migration-send", tmpContainerMntPoint) + containerMntPoint := getContainerMountPoint(s.container.Project(), containerPool, sourceName) + err = driver.BtrfsPoolVolumesSnapshot(containerMntPoint, migrationSendSnapshot, true, true) + if err != nil { + return err + } + defer driver.BtrfsSubVolumesDelete(migrationSendSnapshot) + + btrfsParent := "" + if len(s.btrfsSnapshotNames) > 0 { + btrfsParent = s.btrfsSnapshotNames[len(s.btrfsSnapshotNames)-1] + } + + wrapper := StorageProgressReader(op, "fs_progress", containerName) + return s.send(conn, migrationSendSnapshot, btrfsParent, wrapper) +} + +func (s *btrfsMigrationSourceDriver2) SendAfterCheckpoint(conn *websocket.Conn, bwlimit string) error { + tmpPath := getSnapshotMountPoint(s.container.Project(), s.pool.Name, + fmt.Sprintf("%s/.migration-send", s.container.Name())) + err := os.MkdirAll(tmpPath, 0711) + if err != nil { + return err + } + + err = os.Chmod(tmpPath, 0700) + if err != nil { + return err + } + + s.stoppedSnapName = fmt.Sprintf("%s/.root", tmpPath) + parentName, _, _ := containerGetParentAndSnapshotName(s.container.Name()) + containerMntPt := getContainerMountPoint(s.container.Project(), s.pool.Name, parentName) + err = driver.BtrfsPoolVolumesSnapshot(containerMntPt, s.stoppedSnapName, true, true) + if err != nil { + return err + } + + return s.send(conn, s.stoppedSnapName, s.runningSnapName, nil) +} + +func (s *btrfsMigrationSourceDriver2) Cleanup() { + if s.stoppedSnapName != "" { + driver.BtrfsSubVolumesDelete(s.stoppedSnapName) + } + + if s.runningSnapName != "" { + driver.BtrfsSubVolumesDelete(s.runningSnapName) + } +} + +func (s *btrfsMigrationSourceDriver2) SendStorageVolume(conn *websocket.Conn, op *operation, bwlimit string, storage storage, volumeOnly bool) error { + msg := fmt.Sprintf("Function not implemented") + logger.Errorf(msg) + return fmt.Errorf(msg) +} + +func btrfsMigrationSink(pool *api.StoragePool, conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { + btrfsRecv := func(snapName string, btrfsPath string, targetPath string, isSnapshot bool, writeWrapper func(io.WriteCloser) io.WriteCloser) error { + args := []string{"receive", "-e", btrfsPath} + cmd := exec.Command("btrfs", args...) + + // Remove the existing pre-created subvolume + err := driver.BtrfsSubVolumesDelete(targetPath) + if err != nil { + logger.Errorf("Failed to delete pre-created BTRFS subvolume: %s: %v", btrfsPath, err) + return err + } + + stdin, err := cmd.StdinPipe() + if err != nil { + return err + } + + stderr, err := cmd.StderrPipe() + if err != nil { + return err + } + + err = cmd.Start() + if err != nil { + return err + } + + writePipe := io.WriteCloser(stdin) + if writeWrapper != nil { + writePipe = writeWrapper(stdin) + } + + <-shared.WebsocketRecvStream(writePipe, conn) + + output, err := ioutil.ReadAll(stderr) + if err != nil { + logger.Debugf("Problem reading btrfs receive stderr %s", err) + } + + err = cmd.Wait() + if err != nil { + logger.Errorf("Problem with btrfs receive: %s", string(output)) + return err + } + + receivedSnapshot := fmt.Sprintf("%s/.migration-send", btrfsPath) + // handle older lxd versions + if !shared.PathExists(receivedSnapshot) { + receivedSnapshot = fmt.Sprintf("%s/.root", btrfsPath) + } + if isSnapshot { + receivedSnapshot = fmt.Sprintf("%s/%s", btrfsPath, snapName) + err = driver.BtrfsPoolVolumesSnapshot(receivedSnapshot, targetPath, true, true) + } else { + err = driver.BtrfsPoolVolumesSnapshot(receivedSnapshot, targetPath, false, true) + } + if err != nil { + logger.Errorf("Problem with btrfs snapshot: %s", err) + return err + } + + err = driver.BtrfsSubVolumesDelete(receivedSnapshot) + if err != nil { + logger.Errorf("Failed to delete BTRFS subvolume \"%s\": %s", btrfsPath, err) + return err + } + + return nil + } + + containerName := args.Container.Name() + _, containerPool, _ := args.Container.Storage().GetContainerPoolInfo() + containersPath := getSnapshotMountPoint(args.Container.Project(), containerPool, containerName) + if !args.ContainerOnly && len(args.Snapshots) > 0 { + err := os.MkdirAll(containersPath, containersDirMode) + if err != nil { + return err + } + + snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", containerPool, "containers-snapshots", projectPrefix(args.Container.Project(), containerName)) + snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(args.Container.Project(), containerName)) + if !shared.PathExists(snapshotMntPointSymlink) { + err := os.Symlink(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) + if err != nil { + return err + } + } + } + + // At this point we have already figured out the parent + // container's root disk device so we can simply + // retrieve it from the expanded devices. + parentStoragePool := "" + parentExpandedDevices := args.Container.ExpandedDevices() + parentLocalRootDiskDeviceKey, parentLocalRootDiskDevice, _ := shared.GetRootDiskDevice(parentExpandedDevices) + if parentLocalRootDiskDeviceKey != "" { + parentStoragePool = parentLocalRootDiskDevice["pool"] + } + + // A little neuroticism. + if parentStoragePool == "" { + return fmt.Errorf("Detected that the container's root device is missing the pool property during BTRFS migration") + } + + if !args.ContainerOnly { + for _, snap := range args.Snapshots { + ctArgs := snapshotProtobufToContainerArgs(args.Container.Project(), containerName, snap) + + // Ensure that snapshot and parent container have the + // same storage pool in their local root disk device. + // If the root disk device for the snapshot comes from a + // profile on the new instance as well we don't need to + // do anything. + if ctArgs.Devices != nil { + snapLocalRootDiskDeviceKey, _, _ := shared.GetRootDiskDevice(ctArgs.Devices) + if snapLocalRootDiskDeviceKey != "" { + ctArgs.Devices[snapLocalRootDiskDeviceKey]["pool"] = parentStoragePool + } + } + + snapshotMntPoint := getSnapshotMountPoint(args.Container.Project(), containerPool, ctArgs.Name) + _, err := containerCreateEmptySnapshot(args.Container.DaemonState(), ctArgs) + if err != nil { + return err + } + + snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", pool.Name, "containers-snapshots", projectPrefix(args.Container.Project(), containerName)) + snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(args.Container.Project(), containerName)) + err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) + if err != nil { + return err + } + + tmpSnapshotMntPoint, err := ioutil.TempDir(containersPath, projectPrefix(args.Container.Project(), containerName)) + if err != nil { + return err + } + defer os.RemoveAll(tmpSnapshotMntPoint) + + err = os.Chmod(tmpSnapshotMntPoint, 0700) + if err != nil { + return err + } + + wrapper := StorageProgressWriter(op, "fs_progress", *snap.Name) + err = btrfsRecv(*(snap.Name), tmpSnapshotMntPoint, snapshotMntPoint, true, wrapper) + if err != nil { + return err + } + } + } + + containersMntPoint := getContainerMountPoint(args.Container.Project(), pool.Name, "") + err := createContainerMountpoint(containersMntPoint, args.Container.Path(), args.Container.IsPrivileged()) + if err != nil { + return err + } + + /* finally, do the real container */ + wrapper := StorageProgressWriter(op, "fs_progress", containerName) + tmpContainerMntPoint, err := ioutil.TempDir(containersMntPoint, projectPrefix(args.Container.Project(), containerName)) + if err != nil { + return err + } + defer os.RemoveAll(tmpContainerMntPoint) + + err = os.Chmod(tmpContainerMntPoint, 0700) + if err != nil { + return err + } + + containerMntPoint := getContainerMountPoint(args.Container.Project(), pool.Name, containerName) + err = btrfsRecv("", tmpContainerMntPoint, containerMntPoint, false, wrapper) + if err != nil { + return err + } + + if args.Live { + err = btrfsRecv("", tmpContainerMntPoint, containerMntPoint, false, wrapper) + if err != nil { + return err + } + } + + return nil +} From 4bb7790ff2626d691e36be3f66fb4746c6491fec Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 15:31:12 +0200 Subject: [PATCH 10/15] storage: Add dir Signed-off-by: Thomas Hipp --- lxd/storage/dir.go | 546 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 546 insertions(+) create mode 100644 lxd/storage/dir.go diff --git a/lxd/storage/dir.go b/lxd/storage/dir.go new file mode 100644 index 0000000000..1235f6e094 --- /dev/null +++ b/lxd/storage/dir.go @@ -0,0 +1,546 @@ +package storage + +import ( + "fmt" + "os" + "path/filepath" + "strings" + "syscall" + + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" + "github.com/lxc/lxd/shared/logger" +) + +type Dir struct { + driverShared +} + +func (s *Dir) Init() error { + s.sTypeVersion = "1" + + return nil +} + +func (s *Dir) StoragePoolCheck() error { + // Nothing to do + return nil +} + +func (s *Dir) StoragePoolCreate() error { + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + source := shared.HostPath(s.pool.Config["source"]) + if source == "" { + source = filepath.Join(shared.VarPath("storage-pools"), s.pool.Name) + s.pool.Config["source"] = source + } else { + cleanSource := filepath.Clean(source) + lxdDir := shared.VarPath() + + if strings.HasPrefix(cleanSource, lxdDir) && + cleanSource != poolMntPoint { + return fmt.Errorf(`DIR storage pool requests in LXD `+ + `directory "%s" are only valid under `+ + `"%s"\n(e.g. source=%s)`, shared.VarPath(), + shared.VarPath("storage-pools"), poolMntPoint) + } + + source = filepath.Clean(source) + } + + revert := true + + if !shared.PathExists(source) { + err := os.MkdirAll(source, 0711) + if err != nil { + return err + } + + defer func() { + if !revert { + return + } + os.Remove(source) + }() + } else { + empty, err := shared.PathIsEmpty(source) + if err != nil { + return err + } + + if !empty { + return fmt.Errorf("The provided directory is not empty") + } + } + + if !shared.PathExists(poolMntPoint) { + err := os.MkdirAll(poolMntPoint, 0711) + if err != nil { + return err + } + defer func() { + if !revert { + return + } + os.Remove(poolMntPoint) + }() + } + + revert = false + + return nil +} + +func (s *Dir) StoragePoolDelete() error { + source := shared.HostPath(s.pool.Config["source"]) + if source == "" { + return fmt.Errorf("no \"source\" property found for the storage pool") + } + + _, err := s.StoragePoolUmount() + if err != nil { + return err + } + + if shared.PathExists(source) { + err := os.RemoveAll(source) + if err != nil { + return err + } + } + + prefix := shared.VarPath("storage-pools") + if !strings.HasPrefix(source, prefix) { + storagePoolSymlink := getStoragePoolMountPoint(s.pool.Name) + if !shared.PathExists(storagePoolSymlink) { + return nil + } + + err := os.Remove(storagePoolSymlink) + if err != nil { + return err + } + } + + return nil +} + +func (s *Dir) StoragePoolMount() (bool, error) { + cleanupFunc := LockPoolMount(s.pool.Name) + if cleanupFunc == nil { + return false, nil + } + defer cleanupFunc() + + source := shared.HostPath(s.pool.Config["source"]) + if source == "" { + return false, fmt.Errorf("no \"source\" property found for the storage pool") + } + + cleanSource := filepath.Clean(source) + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + if cleanSource == poolMntPoint { + return true, nil + } + + mountSource := cleanSource + mountFlags := syscall.MS_BIND + + if shared.IsMountPoint(poolMntPoint) { + return false, nil + } + + err := syscall.Mount(mountSource, poolMntPoint, "", uintptr(mountFlags), "") + if err != nil { + logger.Errorf(`Failed to mount DIR storage pool "%s" onto "%s": %s`, mountSource, poolMntPoint, err) + return false, err + } + + return true, nil +} + +func (s *Dir) StoragePoolUmount() (bool, error) { + cleanupFunc := LockPoolUmount(s.pool.Name) + if cleanupFunc == nil { + return false, nil + } + defer cleanupFunc() + + source := s.pool.Config["source"] + if source == "" { + return false, fmt.Errorf("no \"source\" property found for the storage pool") + } + + cleanSource := filepath.Clean(source) + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + if cleanSource == poolMntPoint { + return true, nil + } + + if !shared.IsMountPoint(poolMntPoint) { + return false, nil + } + + err := syscall.Unmount(poolMntPoint, 0) + if err != nil { + return false, err + } + + return true, nil +} + +func (s *Dir) StoragePoolResources() (*api.ResourcesStoragePool, error) { + ourMount, err := s.StoragePoolMount() + if err != nil { + return nil, err + } + + if ourMount { + defer s.StoragePoolUmount() + } + + return storageResource(getStoragePoolMountPoint(s.pool.Name)) +} + +func (s *Dir) StoragePoolUpdate(writable *api.StoragePoolPut, + changedConfig []string) error { + // Nothing to do + return nil +} + +func (s *Dir) VolumeUpdate(writable *api.StorageVolumePut, changedConfig []string) error { + // Nothing to do + return nil +} + +func (s *Dir) VolumeRestore(project string, sourceName string, targetName string, volumeType VolumeType) error { + return s.VolumeSnapshotRestore(project, sourceName, targetName, volumeType) +} + +func (s *Dir) VolumeCreate(project string, volumeName string, volumeType VolumeType) error { + var volumePath string + + switch volumeType { + case VolumeTypeContainer: + volumePath = getContainerMountPoint(project, s.pool.Name, volumeName) + case VolumeTypeImage: + volumePath = getImageMountPoint(s.pool.Name, volumeName) + case VolumeTypeCustom: + volumePath = getStoragePoolVolumeMountPoint(s.pool.Name, volumeName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + err := os.MkdirAll(volumePath, 0711) + if err != nil { + return err + } + + return nil +} + +func (s *Dir) VolumeCopy(project string, source string, target string, snapshots []string, volumeType VolumeType) error { + var sourceMountPoint string + var targetMountPoint string + + switch volumeType { + case VolumeTypeContainer: + for _, snap := range snapshots { + sourceMountPoint = getSnapshotMountPoint(project, s.pool.Name, fmt.Sprintf("%s/%s", source, snap)) + targetMountPoint = getSnapshotMountPoint(project, s.pool.Name, fmt.Sprintf("%s/%s", target, snap)) + + err := s.rsync(sourceMountPoint, targetMountPoint) + if err != nil { + return err + } + } + + sourceMountPoint = getContainerMountPoint(project, s.pool.Name, source) + targetMountPoint = getContainerMountPoint(project, s.pool.Name, target) + case VolumeTypeCustom: + for _, snap := range snapshots { + sourceMountPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, fmt.Sprintf("%s/%s", source, snap)) + targetMountPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, fmt.Sprintf("%s/%s", target, snap)) + + err := s.rsync(sourceMountPoint, targetMountPoint) + if err != nil { + return err + } + } + + sourceMountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, source) + targetMountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, target) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + return s.rsync(sourceMountPoint, targetMountPoint) +} + +func (s *Dir) doCopy(source string, target string) error { + err := s.rsync(source, target) + if err != nil { + return err + } + + return nil +} + +func (s *Dir) VolumeDelete(project string, volumeName string, recursive bool, volumeType VolumeType) error { + var path string + + switch volumeType { + case VolumeTypeContainer: + path = getContainerMountPoint(project, s.pool.Name, volumeName) + case VolumeTypeCustom: + path = getStoragePoolVolumeMountPoint(s.pool.Name, volumeName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + err := os.RemoveAll(path) + if err != nil { + return err + } + + return nil +} + +func (s *Dir) VolumeRename(project string, oldName string, newName string, snapshots []string, + volumeType VolumeType) error { + var oldPath string + var newPath string + + switch volumeType { + case VolumeTypeContainer: + oldPath = getContainerMountPoint(project, s.pool.Name, oldName) + newPath = getContainerMountPoint(project, s.pool.Name, newName) + case VolumeTypeCustom: + oldPath = getStoragePoolVolumeMountPoint(s.pool.Name, oldName) + newPath = getStoragePoolVolumeMountPoint(s.pool.Name, newName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + if shared.PathExists(newPath) { + // Nothing to do + return nil + } + + return os.Rename(oldPath, newPath) +} + +func (s *Dir) VolumeMount(project string, name string, volumeType VolumeType) (bool, error) { + // Nothing to do + return true, nil +} + +func (s *Dir) VolumeUmount(project string, name string, volumeType VolumeType) (bool, error) { + // Nothing to do + return true, nil +} + +func (s *Dir) VolumeGetUsage(project, name string, + path string) (int64, error) { + return -1, fmt.Errorf("The directory container backend doesn't support quotas") +} + +func (s *Dir) VolumeSetQuota(project, name string, + size int64, userns bool, volumeType VolumeType) error { + return nil +} + +func (s *Dir) VolumeSnapshotCreate(project string, source string, target string, volumeType VolumeType) error { + var sourceMountPoint string + var targetMountPoint string + + switch volumeType { + case VolumeTypeContainerSnapshot: + sourceMountPoint = getContainerMountPoint(project, s.pool.Name, source) + targetMountPoint = getSnapshotMountPoint(project, s.pool.Name, target) + case VolumeTypeCustomSnapshot: + sourceMountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, source) + targetMountPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target) + + err := os.MkdirAll(targetMountPoint, 0711) + if err != nil { + return err + } + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + if source == "" { + // Nothing to do + return nil + } + + return s.doCopy(sourceMountPoint, targetMountPoint) +} + +func (s *Dir) VolumeSnapshotCopy(project string, source string, target string, volumeType VolumeType) error { + var sourceMountPoint string + var targetMountPoint string + + switch volumeType { + case VolumeTypeContainerSnapshot: + sourceMountPoint = getSnapshotMountPoint(project, s.pool.Name, source) + + if shared.IsSnapshot(target) { + targetMountPoint = getSnapshotMountPoint(project, s.pool.Name, target) + } else { + targetMountPoint = getContainerMountPoint(project, s.pool.Name, target) + } + case VolumeTypeCustomSnapshot: + sourceMountPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, source) + + if shared.IsSnapshot(target) { + targetMountPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target) + } else { + targetMountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, target) + } + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + return s.doCopy(sourceMountPoint, targetMountPoint) +} + +func (s *Dir) VolumeSnapshotDelete(project string, volumeName string, recursive bool, volumeType VolumeType) error { + var path string + + switch volumeType { + case VolumeTypeCustomSnapshot: + path = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, volumeName) + case VolumeTypeContainerSnapshot: + path = getSnapshotMountPoint(project, s.pool.Name, volumeName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + err := os.RemoveAll(path) + if err != nil { + return err + } + + return nil +} + +func (s *Dir) VolumeSnapshotRestore(project string, sourceName string, targetName string, volumeType VolumeType) error { + var sourceMountPoint string + var targetMountPoint string + + switch volumeType { + case VolumeTypeContainerSnapshot: + sourceMountPoint = getSnapshotMountPoint(project, s.pool.Name, sourceName) + targetMountPoint = getContainerMountPoint(project, s.pool.Name, targetName) + case VolumeTypeCustomSnapshot: + sourceMountPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, sourceName) + targetMountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, targetName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + return s.doCopy(sourceMountPoint, targetMountPoint) +} + +func (s *Dir) VolumeSnapshotRename(project string, oldName string, newName string, volumeType VolumeType) error { + switch volumeType { + case VolumeTypeContainerSnapshot: + // Nothing to do as this is handled by the storage itself + case VolumeTypeCustomSnapshot: + // Nothing to do as this is handled by the storage itself + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + + } + + return nil +} + +func (s *Dir) VolumeBackupCreate(path string, project string, source string, + snapshots []string, optimized bool) error { + snapshotsPath := fmt.Sprintf("%s/snapshots", path) + + if len(snapshots) > 0 { + err := os.MkdirAll(snapshotsPath, 0711) + if err != nil { + return err + } + } + for _, snap := range snapshots { + fullSnapshotName := fmt.Sprintf("%s/%s", s.volume.Name, snap) + snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, fullSnapshotName) + target := fmt.Sprintf("%s/%s", snapshotsPath, snap) + + // Copy the snapshot + err := s.rsync(snapshotMntPoint, target) + if err != nil { + return err + } + } + + // Copy the container + sourcePath := containerPath(project, source, false) + targetPath := fmt.Sprintf("%s/container", path) + err := s.rsync(sourcePath, targetPath) + if err != nil { + return err + } + + return nil +} + +func (s *Dir) VolumeBackupLoad(backupDir string, project string, + containerName string, snapshots []string, privileged bool, optimized bool) error { + if optimized { + return fmt.Errorf("Dir storage doesn't support binary backups") + } + + // Nothing to do + return nil +} + +func (s *Dir) VolumePrepareRestore(sourceName string, targetName string, targetSnapshots []string, f func() error) error { + // Nothing to do + return nil +} + +func (s *Dir) VolumeReady(project string, name string) bool { + containerMntPoint := getContainerMountPoint(project, s.pool.Name, name) + ok, _ := shared.PathIsEmpty(containerMntPoint) + return !ok +} + +func DirSnapshotDeleteInternal(project, poolName string, snapshotName string) error { + snapshotContainerMntPoint := getSnapshotMountPoint(project, poolName, snapshotName) + if shared.PathExists(snapshotContainerMntPoint) { + err := os.RemoveAll(snapshotContainerMntPoint) + if err != nil { + return err + } + } + + sourceContainerName, _, _ := containerGetParentAndSnapshotName(snapshotName) + snapshotContainerPath := getSnapshotMountPoint(project, poolName, sourceContainerName) + empty, _ := shared.PathIsEmpty(snapshotContainerPath) + if empty == true { + err := os.Remove(snapshotContainerPath) + if err != nil { + return err + } + + snapshotSymlink := shared.VarPath("snapshots", projectPrefix(project, sourceContainerName)) + if shared.PathExists(snapshotSymlink) { + err := os.Remove(snapshotSymlink) + if err != nil { + return err + } + } + } + + return nil +} From 42555cfec2b3687de241a5ac55160a07032c81ee Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 15:31:25 +0200 Subject: [PATCH 11/15] storage: Add zfs Signed-off-by: Thomas Hipp --- lxd/storage/zfs.go | 2759 ++++++++++++++++++++++++++++++++++ lxd/storage_migration_zfs.go | 372 +++++ 2 files changed, 3131 insertions(+) create mode 100644 lxd/storage/zfs.go create mode 100644 lxd/storage_migration_zfs.go diff --git a/lxd/storage/zfs.go b/lxd/storage/zfs.go new file mode 100644 index 0000000000..27a5718945 --- /dev/null +++ b/lxd/storage/zfs.go @@ -0,0 +1,2759 @@ +package storage + +import ( + "fmt" + "io/ioutil" + "os" + "os/exec" + "path/filepath" + "strconv" + "strings" + "syscall" + "time" + + "github.com/lxc/lxd/lxd/util" + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" + "github.com/lxc/lxd/shared/logger" + "github.com/pborman/uuid" + "github.com/pkg/errors" +) + +type Zfs struct { + driverShared + + dataset string +} + +// Global defaults +var zfsUseRefquota = "false" + +// Cache +var ZfsVersion = "" + +func (s *Zfs) Init() error { + if s.pool != nil && s.pool.Config["zfs.pool_name"] != "" { + s.dataset = s.pool.Config["zfs.pool_name"] + } + + if ZfsVersion != "" { + s.sTypeVersion = ZfsVersion + return nil + } + + util.LoadModule("zfs") + + if !zfsIsEnabled() { + return fmt.Errorf("The \"zfs\" tool is not enabled") + } + + var err error + + s.sTypeVersion, err = zfsToolVersionGet() + if err != nil { + s.sTypeVersion, err = zfsModuleVersionGet() + if err != nil { + return err + } + } + + ZfsVersion = s.sTypeVersion + + return nil +} + +func (s *Zfs) StoragePoolCheck() error { + source := s.pool.Config["source"] + if source == "" { + return fmt.Errorf("no \"source\" property found for the storage pool") + } + + poolName := s.getOnDiskPoolName() + purePoolName := strings.Split(poolName, "/")[0] + + exists := ZfsFilesystemEntityExists(purePoolName, "") + if exists { + return nil + } + + var err error + var msg string + + if filepath.IsAbs(source) { + disksPath := shared.VarPath("disks") + msg, err = shared.RunCommand("zpool", "import", "-d", disksPath, poolName) + } else { + msg, err = shared.RunCommand("zpool", "import", purePoolName) + } + + if err != nil { + return fmt.Errorf("ZFS storage pool \"%s\" could not be imported: %s", poolName, msg) + } + + return nil +} + +func (s *Zfs) StoragePoolCreate() error { + err := s.zfsPoolCreate() + if err != nil { + return err + } + revert := true + defer func() { + if !revert { + return + } + s.StoragePoolDelete() + }() + + storagePoolMntPoint := getStoragePoolMountPoint(s.pool.Name) + err = os.MkdirAll(storagePoolMntPoint, 0711) + if err != nil { + return err + } + + err = s.StoragePoolCheck() + if err != nil { + return err + } + + revert = false + + return nil +} + +func (s *Zfs) StoragePoolDelete() error { + poolName := s.getOnDiskPoolName() + + if ZfsFilesystemEntityExists(poolName, "") { + err := zfsFilesystemEntityDelete(s.pool.Config["source"], poolName) + if err != nil { + return err + } + } + + storagePoolMntPoint := getStoragePoolMountPoint(s.pool.Name) + if shared.PathExists(storagePoolMntPoint) { + err := os.RemoveAll(storagePoolMntPoint) + if err != nil { + return err + } + } + + return nil +} + +func (s *Zfs) StoragePoolMount() (bool, error) { + return true, nil +} + +func (s *Zfs) StoragePoolUmount() (bool, error) { + return true, nil +} + +func (s *Zfs) StoragePoolResources() (*api.ResourcesStoragePool, error) { + poolName := s.getOnDiskPoolName() + + totalBuf, err := zfsFilesystemEntityPropertyGet(poolName, "", "available") + if err != nil { + return nil, err + } + + totalStr := string(totalBuf) + totalStr = strings.TrimSpace(totalStr) + + total, err := strconv.ParseUint(totalStr, 10, 64) + if err != nil { + return nil, err + } + + usedBuf, err := zfsFilesystemEntityPropertyGet(poolName, "", "used") + if err != nil { + return nil, err + } + + usedStr := string(usedBuf) + usedStr = strings.TrimSpace(usedStr) + + used, err := strconv.ParseUint(usedStr, 10, 64) + if err != nil { + return nil, err + } + + res := api.ResourcesStoragePool{} + res.Space.Total = total + res.Space.Used = used + + // Inode allocation is dynamic so no use in reporting them. + return &res, nil +} + +func (s *Zfs) StoragePoolUpdate(writable *api.StoragePoolPut, changedConfig []string) error { + // Nothing to do + return nil +} + +func (s *Zfs) doImageCreate(fingerprint string) error { + poolName := s.getOnDiskPoolName() + imageMntPoint := getImageMountPoint(s.pool.Name, fingerprint) + fs := fmt.Sprintf("images/%s", fingerprint) + revert := true + subrevert := true + + if ZfsFilesystemEntityExists(poolName, fmt.Sprintf("deleted/%s", fs)) { + err := zfsPoolVolumeRename(poolName, fmt.Sprintf("deleted/%s", fs), fs, true) + if err != nil { + return err + } + + defer func() { + if !revert { + return + } + s.VolumeDelete("default", fingerprint, true, VolumeTypeImage) + }() + + // In case this is an image from an older lxd instance, wipe the + // mountpoint. + err = zfsPoolVolumeSet(poolName, fs, "mountpoint", "none") + if err != nil { + return err + } + + revert = false + subrevert = false + + return nil + } + + err := os.MkdirAll(imageMntPoint, 0700) + if err != nil { + return err + } + defer func() { + if !subrevert { + return + } + os.RemoveAll(imageMntPoint) + }() + + // Create temporary mountpoint directory. + tmp := getImageMountPoint(s.pool.Name, "") + + tmpImageDir, err := ioutil.TempDir(tmp, "") + if err != nil { + return err + } + defer os.RemoveAll(tmpImageDir) + + imagePath := shared.VarPath("images", fingerprint) + + // Create a new storage volume on the storage pool for the image. + dataset := fmt.Sprintf("%s/%s", poolName, fs) + + _, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") + if err != nil { + return err + } + + subrevert = false + defer func() { + if !revert { + return + } + s.VolumeDelete("default", fingerprint, true, VolumeTypeImage) + }() + + // Set a temporary mountpoint for the image. + err = zfsPoolVolumeSet(poolName, fs, "mountpoint", tmpImageDir) + if err != nil { + return err + } + + // Make sure that the image actually got mounted. + if !shared.IsMountPoint(tmpImageDir) { + ZfsMount(poolName, fs) + } + + // Unpack the image into the temporary mountpoint. + err = unpackImage(imagePath, tmpImageDir, false, s.s.OS.RunningInUserNS, nil) + if err != nil { + return err + } + + // Mark the new storage volume for the image as readonly. + err = zfsPoolVolumeSet(poolName, fs, "readonly", "on") + if err != nil { + return err + } + + // Remove the temporary mountpoint from the image storage volume. + err = zfsPoolVolumeSet(poolName, fs, "mountpoint", "none") + if err != nil { + return err + } + + // Make sure that the image actually got unmounted. + if shared.IsMountPoint(tmpImageDir) { + ZfsUmount(poolName, fs, tmpImageDir) + } + + // Create a snapshot of that image on the storage pool which we clone for + // container creation. + err = ZfsPoolVolumeSnapshotCreate(poolName, fs, "readonly") + if err != nil { + return err + } + + revert = false + + return nil +} + +func (s *Zfs) VolumeCreate(project string, volumeName string, volumeType VolumeType) error { + var volumeMntPoint string + var dataset string + var fs string + + if shared.IsSnapshot(volumeName) { + return fmt.Errorf("Volume is a snapshot") + } + + poolName := s.getOnDiskPoolName() + + switch volumeType { + case VolumeTypeContainer: + dataset = fmt.Sprintf("%s/containers/%s", poolName, projectPrefix(project, volumeName)) + volumeMntPoint = getContainerMountPoint(project, s.pool.Name, volumeName) + fs = s.getDataset(project, volumeName, "containers") + case VolumeTypeCustom: + dataset = fmt.Sprintf("%s/custom/%s", poolName, volumeName) + volumeMntPoint = getStoragePoolVolumeMountPoint(s.pool.Name, volumeName) + fs = s.getDataset("default", volumeName, "custom") + case VolumeTypeImage: + return s.doImageCreate(volumeName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + _, err := zfsPoolVolumeCreate(dataset, fmt.Sprintf("mountpoint=%s", volumeMntPoint), + "canmount=noauto") + if err != nil { + return err + } + + if !shared.IsMountPoint(volumeMntPoint) { + err := ZfsMount(poolName, fs) + if err != nil { + return err + } + defer ZfsUmount(poolName, fs, volumeMntPoint) + } + + /* + // apply quota + if s.volume.Config["size"] != "" { + size, err := shared.ParseByteSizeString(s.volume.Config["size"]) + if err != nil { + return err + } + + err = s.StorageEntitySetQuota(storagePoolVolumeTypeCustom, size, nil) + if err != nil { + return err + } + } + + revert = false + */ + + return nil +} + +func (s *Zfs) VolumeCopy(project, source string, target string, snapshots []string, volumeType VolumeType) error { + switch volumeType { + case VolumeTypeContainer: + if len(snapshots) == 0 { + return s.doContainerCopy(project, source, target) + } + + return s.doContainerCopyWithSnapshots(project, source, target, snapshots) + case VolumeTypeCustom: + if len(snapshots) == 0 { + return s.doVolumeCopy(source, target) + } + + return s.doVolumeCopyWithSnapshots(source, target, snapshots) + case VolumeTypeImage: + poolName := s.getOnDiskPoolName() + sourceFs := fmt.Sprintf("images/%s", source) + targetFs := fmt.Sprintf("containers/%s", projectPrefix(project, target)) + volumeMountPoint := getContainerMountPoint(project, s.pool.Name, target) + + return zfsPoolVolumeClone(project, poolName, sourceFs, "readonly", targetFs, volumeMountPoint) + } + + return fmt.Errorf("Unsupported volume type: %v", volumeType) +} + +func (s *Zfs) doContainerCopy(project string, source string, target string) error { + if s.pool.Config["zfs.clone_copy"] != "" && !shared.IsTrue(s.pool.Config["zfs.clone_copy"]) { + return s.doCopyWithoutSnapshotFull(project, source, target) + } + + return s.doCopyWithoutSnapshotsSparse(project, source, target) +} + +func (s *Zfs) doCopyWithoutSnapshotsSparse(project string, source string, target string) error { + poolName := s.getOnDiskPoolName() + + sourceContainerName := source + sourceContainerPath := containerPath(project, source, false) + + targetContainerName := target + targetContainerPath := containerPath(project, target, false) + targetContainerMountPoint := getContainerMountPoint(project, s.pool.Name, targetContainerName) + + sourceZfsDataset := "" + sourceZfsDatasetSnapshot := "" + sourceName, sourceSnapOnlyName, isSnapshotName := containerGetParentAndSnapshotName(sourceContainerName) + + targetZfsDataset := s.getDataset(project, target, "containers") + + if isSnapshotName { + sourceZfsDatasetSnapshot = sourceSnapOnlyName + } + + revert := true + if sourceZfsDatasetSnapshot == "" { + if ZfsFilesystemEntityExists(poolName, fmt.Sprintf("containers/%s", projectPrefix(project, sourceName))) { + sourceZfsDatasetSnapshot = fmt.Sprintf("copy-%s", uuid.NewRandom().String()) + sourceZfsDataset = fmt.Sprintf("containers/%s", projectPrefix(project, sourceName)) + + err := ZfsPoolVolumeSnapshotCreate(poolName, sourceZfsDataset, sourceZfsDatasetSnapshot) + if err != nil { + return err + } + defer func() { + if !revert { + return + } + ZfsPoolVolumeSnapshotDestroy(poolName, sourceZfsDataset, sourceZfsDatasetSnapshot) + }() + } + } else { + if ZfsFilesystemEntityExists(poolName, fmt.Sprintf("containers/%s at snapshot-%s", projectPrefix(project, sourceName), sourceZfsDatasetSnapshot)) { + sourceZfsDataset = fmt.Sprintf("containers/%s", projectPrefix(project, sourceName)) + sourceZfsDatasetSnapshot = fmt.Sprintf("snapshot-%s", sourceZfsDatasetSnapshot) + } + } + + if sourceZfsDataset != "" { + err := zfsPoolVolumeClone(project, poolName, sourceZfsDataset, sourceZfsDatasetSnapshot, targetZfsDataset, targetContainerMountPoint) + if err != nil { + return err + } + defer func() { + if !revert { + return + } + zfsPoolVolumeDestroy(poolName, targetZfsDataset) + }() + + // TODO: This should be called in the storage not the driver + /* + ourMount, err := s.ContainerMount(target) + if err != nil { + return err + } + if ourMount { + defer s.ContainerUmount(target, targetContainerPath) + } + + err = createContainerMountpoint(targetContainerMountPoint, targetContainerPath, target.IsPrivileged()) + if err != nil { + return err + } + defer func() { + if !revert { + return + } + deleteContainerMountpoint(targetContainerMountPoint, targetContainerPath, s.GetStorageTypeName()) + }() + */ + } else { + err := s.VolumeCreate(project, target, VolumeTypeContainer) + if err != nil { + return err + } + defer func() { + if !revert { + return + } + s.VolumeDelete(project, target, false, VolumeTypeContainer) + }() + + err = s.rsync(sourceContainerPath, targetContainerPath) + if err != nil { + return err + } + } + + revert = false + + return nil +} + +func (s *Zfs) doCopyWithoutSnapshotFull(project string, source string, target string) error { + sourceIsSnapshot := shared.IsSnapshot(source) + poolName := s.getOnDiskPoolName() + + sourceDataset := "" + snapshotSuffix := "" + + targetName := target + targetDataset := fmt.Sprintf("%s/containers/%s", poolName, projectPrefix(project, targetName)) + targetSnapshotDataset := "" + + if sourceIsSnapshot { + sourceParentName, sourceSnapOnlyName, _ := containerGetParentAndSnapshotName(source) + snapshotSuffix = fmt.Sprintf("snapshot-%s", sourceSnapOnlyName) + sourceDataset = fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(project, sourceParentName), snapshotSuffix) + targetSnapshotDataset = fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(project, targetName), sourceSnapOnlyName) + } else { + snapshotSuffix = uuid.NewRandom().String() + sourceDataset = fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(project, source), snapshotSuffix) + targetSnapshotDataset = fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(project, targetName), snapshotSuffix) + + fs := fmt.Sprintf("containers/%s", projectPrefix(project, source)) + err := ZfsPoolVolumeSnapshotCreate(poolName, fs, snapshotSuffix) + if err != nil { + return err + } + defer func() { + err := ZfsPoolVolumeSnapshotDestroy(poolName, fs, snapshotSuffix) + if err != nil { + logger.Warnf("Failed to delete temporary ZFS snapshot \"%s\", manual cleanup needed", sourceDataset) + } + }() + } + + zfsSendCmd := exec.Command("zfs", "send", sourceDataset) + + zfsRecvCmd := exec.Command("zfs", "receive", targetDataset) + + zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() + zfsRecvCmd.Stdout = os.Stdout + zfsRecvCmd.Stderr = os.Stderr + + err := zfsRecvCmd.Start() + if err != nil { + return err + } + + err = zfsSendCmd.Run() + if err != nil { + return err + } + + err = zfsRecvCmd.Wait() + if err != nil { + return err + } + + msg, err := shared.RunCommand("zfs", "rollback", "-r", "-R", targetSnapshotDataset) + if err != nil { + logger.Errorf("Failed to rollback ZFS dataset: %s", msg) + return err + } + + targetContainerMountPoint := getContainerMountPoint(project, s.pool.Name, targetName) + targetfs := fmt.Sprintf("containers/%s", projectPrefix(project, targetName)) + + err = zfsPoolVolumeSet(poolName, targetfs, "canmount", "noauto") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(poolName, targetfs, "mountpoint", targetContainerMountPoint) + if err != nil { + return err + } + + err = ZfsPoolVolumeSnapshotDestroy(poolName, targetfs, snapshotSuffix) + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) getFullDataset(project string, name string, datasetType string) string { + return fmt.Sprintf("%s/%s", s.getOnDiskPoolName(), s.getDataset(project, name, datasetType)) +} + +func (s *Zfs) getDataset(project string, name string, datasetType string) string { + var out string + + if shared.IsSnapshot(name) { + parentName, snapshotName, _ := containerGetParentAndSnapshotName(name) + + if project != "" { + parentName = projectPrefix(project, parentName) + } + + out = fmt.Sprintf("%s/%s at snapshot-%s", datasetType, parentName, snapshotName) + } else { + if project != "" { + name = projectPrefix(project, name) + } + + out = fmt.Sprintf("%s/%s", datasetType, name) + } + + return out +} + +func (s *Zfs) doCopyWithSnapshots(project string, source string, target string, parentSnapshot string) error { + currentSnapshotDataset := s.getFullDataset(project, source, "containers") + targetSnapshotDataset := s.getFullDataset(project, target, "containers") + + args := []string{"send", currentSnapshotDataset} + + if parentSnapshot != "" { + parentSnapshotDataset := s.getFullDataset(project, parentSnapshot, "containers") + args = append(args, "-i", parentSnapshotDataset) + } + + logger.Debugf("zfs args: %+v", args) + logger.Debugf("zfs targetSnapshotDataset: %v", targetSnapshotDataset) + + zfsSendCmd := exec.Command("zfs", args...) + + zfsRecvCmd := exec.Command("zfs", "receive", "-F", targetSnapshotDataset) + zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() + zfsRecvCmd.Stdout = os.Stdout + zfsRecvCmd.Stderr = os.Stderr + + err := zfsRecvCmd.Start() + if err != nil { + return err + } + + err = zfsSendCmd.Run() + if err != nil { + return err + } + + err = zfsRecvCmd.Wait() + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) doContainerCopyWithSnapshots(project string, source string, target string, snapshots []string) error { + targetContainerName := target + //targetContainerPath := containerPath(project, target, false) + + targetContainerMountPoint := getContainerMountPoint(project, s.pool.Name, targetContainerName) + + prev := "" + prevSnapOnlyName := "" + + for i, snap := range snapshots { + if i > 0 { + prev = fmt.Sprintf("%s/%s", source, snapshots[i-1]) + } + + oldSnapName := fmt.Sprintf("%s/%s", source, snap) + newSnapName := fmt.Sprintf("%s/%s", target, snap) + prevSnapOnlyName = snap + + err := s.doCopyWithSnapshots(project, oldSnapName, newSnapName, prev) + if err != nil { + logger.Error("Failed to copy snapshots") + return err + } + } + + poolName := s.getOnDiskPoolName() + + // send actual container + tmpSnapshotName := fmt.Sprintf("copy-send-%s", uuid.NewRandom().String()) + + err := ZfsPoolVolumeSnapshotCreate(poolName, fmt.Sprintf("containers/%s", projectPrefix(project, source)), tmpSnapshotName) + if err != nil { + return err + } + + currentSnapshotDataset := fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(project, source), tmpSnapshotName) + args := []string{"send", currentSnapshotDataset} + if prevSnapOnlyName != "" { + parentSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(project, source), prevSnapOnlyName) + args = append(args, "-i", parentSnapshotDataset) + } + + zfsSendCmd := exec.Command("zfs", args...) + targetSnapshotDataset := fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(project, target), tmpSnapshotName) + zfsRecvCmd := exec.Command("zfs", "receive", "-F", targetSnapshotDataset) + + zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() + zfsRecvCmd.Stdout = os.Stdout + zfsRecvCmd.Stderr = os.Stderr + + err = zfsRecvCmd.Start() + if err != nil { + return err + } + + err = zfsSendCmd.Run() + if err != nil { + return err + } + + err = zfsRecvCmd.Wait() + if err != nil { + return err + } + + ZfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(project, source)), tmpSnapshotName) + ZfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(project, target)), tmpSnapshotName) + + fs := fmt.Sprintf("containers/%s", projectPrefix(project, target)) + err = zfsPoolVolumeSet(poolName, fs, "canmount", "noauto") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(poolName, fs, "mountpoint", targetContainerMountPoint) + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) doVolumeCopy(source string, target string) error { + var err error + + //fs := fmt.Sprintf("custom/%s", target) + + if s.pool.Config["zfs.clone_copy"] != "" && !shared.IsTrue(s.pool.Config["zfs.clone_copy"]) { + err = s.doCopyVolumeWithoutSnapshotsFull(source) + } else { + err = s.doCopyVolumeWithoutSnapshotsSparse(source) + } + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) doCopyVolumeWithSnapshots(source string, target string, parentSnapshot string) error { + poolName := s.getOnDiskPoolName() + + currentSnapshotDataset := s.getFullDataset("default", source, "custom") + targetSnapshotDataset := s.getFullDataset("default", target, "custom") + + args := []string{"send", currentSnapshotDataset} + + if parentSnapshot != "" { + parentName, parentSnaponlyName, _ := containerGetParentAndSnapshotName(parentSnapshot) + parentSnapshotDataset := fmt.Sprintf("%s/custom/%s at snapshot-%s", poolName, parentName, parentSnaponlyName) + args = append(args, "-i", parentSnapshotDataset) + } + + zfsSendCmd := exec.Command("zfs", args...) + + zfsRecvCmd := exec.Command("zfs", "receive", "-F", targetSnapshotDataset) + zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() + zfsRecvCmd.Stdout = os.Stdout + zfsRecvCmd.Stderr = os.Stderr + + err := zfsRecvCmd.Start() + if err != nil { + return err + } + + err = zfsSendCmd.Run() + if err != nil { + return err + } + + err = zfsRecvCmd.Wait() + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) doVolumeCopyWithSnapshots(source string, target string, snapshots []string) error { + targetVolumeMountPoint := getStoragePoolVolumeMountPoint(s.pool.Name, target) + + snapshots, err := ZfsPoolListSnapshots(s.getOnDiskPoolName(), fmt.Sprintf("custom/%s", source)) + if err != nil { + return err + } + + prev := "" + prevSnapOnlyName := "" + + for i, snap := range snapshots { + if i > 0 { + prev = snapshots[i-1] + } + + oldSnapName := fmt.Sprintf("%s/%s", source, snap) + newSnapName := fmt.Sprintf("%s/%s", target, snap) + + err = s.doCopyVolumeWithSnapshots(oldSnapName, newSnapName, prev) + if err != nil { + return err + } + } + + poolName := s.getOnDiskPoolName() + + // send actual container + tmpSnapshotName := fmt.Sprintf("copy-send-%s", uuid.NewRandom().String()) + err = ZfsPoolVolumeSnapshotCreate(poolName, fmt.Sprintf("custom/%s", source), tmpSnapshotName) + if err != nil { + return err + } + + currentSnapshotDataset := fmt.Sprintf("%s/custom/%s@%s", poolName, source, tmpSnapshotName) + args := []string{"send", currentSnapshotDataset} + if prevSnapOnlyName != "" { + parentSnapshotDataset := fmt.Sprintf("%s/custom/%s at snapshot-%s", poolName, source, prevSnapOnlyName) + args = append(args, "-i", parentSnapshotDataset) + } + + zfsSendCmd := exec.Command("zfs", args...) + targetSnapshotDataset := fmt.Sprintf("%s/custom/%s@%s", poolName, target, tmpSnapshotName) + zfsRecvCmd := exec.Command("zfs", "receive", "-F", targetSnapshotDataset) + + zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() + zfsRecvCmd.Stdout = os.Stdout + zfsRecvCmd.Stderr = os.Stderr + + err = zfsRecvCmd.Start() + if err != nil { + return err + } + + err = zfsSendCmd.Run() + if err != nil { + return err + } + + err = zfsRecvCmd.Wait() + if err != nil { + return err + } + + ZfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("custom/%s", source), tmpSnapshotName) + ZfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("custom/%s", target), tmpSnapshotName) + + fs := fmt.Sprintf("custom/%s", target) + err = zfsPoolVolumeSet(poolName, fs, "canmount", "noauto") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(poolName, fs, "mountpoint", targetVolumeMountPoint) + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) VolumeDelete(project, volumeName string, recursive bool, volumeType VolumeType) error { + var fs string + + switch volumeType { + case VolumeTypeContainer: + fs = fmt.Sprintf("containers/%s", projectPrefix(project, volumeName)) + case VolumeTypeCustom: + fs = fmt.Sprintf("custom/%s", volumeName) + case VolumeTypeImage: + fs = fmt.Sprintf("images/%s", volumeName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + poolName := s.getOnDiskPoolName() + + if ZfsFilesystemEntityExists(poolName, fs) { + removable := true + snaps, err := ZfsPoolListSnapshots(poolName, fs) + if err != nil { + return err + } + + for _, snap := range snaps { + var err error + removable, err = zfsPoolVolumeSnapshotRemovable(poolName, fs, snap) + if err != nil { + return err + } + + if !removable { + break + } + } + + if removable { + origin, err := zfsFilesystemEntityPropertyGet(poolName, fs, "origin") + if err != nil { + return err + } + + origin = strings.TrimPrefix(origin, fmt.Sprintf("%s/", poolName)) + + err = zfsPoolVolumeDestroy(poolName, fs) + if err != nil { + return err + } + + err = zfsPoolVolumeCleanup(poolName, origin) + if err != nil { + return err + } + } else { + err := zfsPoolVolumeSet(poolName, fs, "mountpoint", "none") + if err != nil { + return err + } + + var target string + + switch volumeType { + case VolumeTypeContainer: + target = fmt.Sprintf("deleted/containers/%s", uuid.NewRandom().String()) + case VolumeTypeCustom: + target = fmt.Sprintf("deleted/custom/%s", uuid.NewRandom().String()) + case VolumeTypeImage: + target = fmt.Sprintf("deleted/images/%s", uuid.NewRandom().String()) + } + + err = zfsPoolVolumeRename(poolName, fs, target, true) + if err != nil { + return err + } + } + } + + return nil +} + +func (s *Zfs) VolumeRename(project, oldName string, newName string, snapshots []string, volumeType VolumeType) error { + var oldDataset string + var newDataset string + var newMountPoint string + + switch volumeType { + case VolumeTypeContainer: + oldDataset = fmt.Sprintf("containers/%s", projectPrefix(project, oldName)) + newDataset = fmt.Sprintf("containers/%s", projectPrefix(project, newName)) + newMountPoint = getContainerMountPoint(project, s.pool.Name, newName) + case VolumeTypeCustom: + oldDataset = fmt.Sprintf("custom/%s", oldName) + newDataset = fmt.Sprintf("custom/%s", newName) + newMountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, newName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + poolName := s.getOnDiskPoolName() + + err := zfsPoolVolumeRename(poolName, oldDataset, newDataset, false) + if err != nil { + return err + } + + err = zfsPoolVolumeSet(poolName, newDataset, "mountpoint", newMountPoint) + if err != nil { + return err + } + + _, err = s.VolumeUmount(project, newName, VolumeTypeContainer) + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) VolumeMount(project string, name string, volumeType VolumeType) (bool, error) { + var cleanupFunc func() + var fs string + var mountPoint string + poolName := s.getOnDiskPoolName() + + switch volumeType { + case VolumeTypeContainer: + cleanupFunc = LockContainerMount(poolName, name) + fs = fmt.Sprintf("containers/%s", projectPrefix(project, name)) + mountPoint = getContainerMountPoint(project, s.pool.Name, name) + case VolumeTypeCustom: + cleanupFunc = LockCustomMount(poolName, name) + fs = fmt.Sprintf("custom/%s", name) + mountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, name) + case VolumeTypeContainerSnapshot: + cName, sName, _ := containerGetParentAndSnapshotName(name) + sourceSnap := fmt.Sprintf("snapshot-%s", sName) + sourceFs := s.getDataset(project, cName, "containers") + destFs := fmt.Sprintf("snapshots/%s/%s", projectPrefix(project, cName), sName) + poolName := s.getOnDiskPoolName() + snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, name) + + err := zfsPoolVolumeClone(project, poolName, sourceFs, sourceSnap, destFs, snapshotMntPoint) + if err != nil { + return false, err + } + + err = ZfsMount(poolName, destFs) + if err != nil { + return false, err + } + + return true, nil + default: + return false, fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + if cleanupFunc == nil { + return false, nil + } + defer cleanupFunc() + + ourMount := false + + if !shared.IsMountPoint(mountPoint) { + ourMount = true + + err := ZfsMount(s.getOnDiskPoolName(), fs) + if err != nil { + return false, err + } + } + + return ourMount, nil +} + +func (s *Zfs) VolumeUmount(project, name string, volumeType VolumeType) (bool, error) { + var cleanupFunc func() + var mountPoint string + var fs string + + poolName := s.getOnDiskPoolName() + + switch volumeType { + case VolumeTypeContainer: + cleanupFunc = LockContainerUmount(poolName, name) + mountPoint = getContainerMountPoint(project, s.pool.Name, name) + fs = fmt.Sprintf("containers/%s", projectPrefix(project, name)) + case VolumeTypeCustom: + cleanupFunc = LockCustomUmount(poolName, name) + mountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, name) + fs = fmt.Sprintf("custom/%s", name) + case VolumeTypeContainerSnapshot: + cleanupFunc = LockContainerUmount(poolName, name) + if cleanupFunc == nil { + return false, nil + } + defer cleanupFunc() + + cName, sName, _ := containerGetParentAndSnapshotName(name) + destFs := fmt.Sprintf("snapshots/%s/%s", projectPrefix(project, cName), sName) + + err := zfsPoolVolumeDestroy(poolName, destFs) + if err != nil { + return false, err + } + + return true, nil + default: + return false, fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + if cleanupFunc == nil { + return false, nil + } + defer cleanupFunc() + + ourUmount := false + + if shared.IsMountPoint(mountPoint) { + ourUmount = true + + err := ZfsUmount(poolName, fs, mountPoint) + if err != nil { + return false, err + } + } + + return ourUmount, nil +} + +func (s *Zfs) VolumeGetUsage(project, name, path string) (int64, error) { + fs := fmt.Sprintf("containers/%s", projectPrefix(project, name)) + property := "used" + + if s.pool.Config["volume.zfs.use_refquota"] != "" { + zfsUseRefquota = s.pool.Config["volume.zfs.use_refquota"] + } + + if s.volume.Config["zfs.use_refquota"] != "" { + zfsUseRefquota = s.volume.Config["zfs.use_refquota"] + } + + if shared.IsTrue(zfsUseRefquota) { + property = "referenced" + } + + // Shortcut for refquota + if property == "referenced" && shared.IsMountPoint(path) { + var stat syscall.Statfs_t + + err := syscall.Statfs(path, &stat) + if err != nil { + return -1, err + } + + return int64(stat.Blocks-stat.Bfree) * int64(stat.Bsize), nil + } + + value, err := zfsFilesystemEntityPropertyGet(s.getOnDiskPoolName(), fs, property) + if err != nil { + return -1, err + } + + valueInt, err := strconv.ParseInt(value, 10, 64) + if err != nil { + return -1, err + } + + return valueInt, nil +} + +func (s *Zfs) VolumeSetQuota(project, name string, size int64, userns bool, volumeType VolumeType) error { + var fs string + + switch volumeType { + case VolumeTypeContainer: + fs = fmt.Sprintf("containers/%s", projectPrefix(project, name)) + case VolumeTypeCustom: + fs = fmt.Sprintf("custom/%s", name) + } + + property := "quota" + + if s.pool.Config["volume.zfs.use_refquota"] != "" { + zfsUseRefquota = s.pool.Config["volume.zfs.use_refquota"] + } + if s.volume.Config["zfs.use_refquota"] != "" { + zfsUseRefquota = s.volume.Config["zfs.use_refquota"] + } + + if shared.IsTrue(zfsUseRefquota) { + property = "refquota" + } + + poolName := s.getOnDiskPoolName() + var err error + if size > 0 { + err = zfsPoolVolumeSet(poolName, fs, property, fmt.Sprintf("%d", size)) + } else { + err = zfsPoolVolumeSet(poolName, fs, property, "none") + } + + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) VolumeUpdate(writable *api.StorageVolumePut, changedConfig []string) error { + if !shared.StringInSlice("size", changedConfig) { + return nil + } + + if s.volume.Type != storagePoolVolumeTypeNameCustom { + return updateStoragePoolVolumeError([]string{"size"}, "zfs") + } + + if s.volume.Config["size"] != writable.Config["size"] { + size, err := shared.ParseByteSizeString(writable.Config["size"]) + if err != nil { + return err + } + + err = s.VolumeSetQuota("default", fmt.Sprintf("custom/%s", s.volume.Name), size, false, VolumeTypeCustom) + if err != nil { + return err + } + } + + return nil +} + +func (s *Zfs) VolumePrepareRestore(sourceName string, targetName string, targetSnapshots []string, f func() error) error { + zfsRemoveSnapshots := "false" + + if targetSnapshots[len(targetSnapshots)-1] != sourceName { + if s.pool.Config["volume.zfs.remove_snapshots"] != "" { + zfsRemoveSnapshots = s.pool.Config["volume.zfs.remove_snapshots"] + } + + if s.volume.Config["zfs.remove_snapshots"] != "" { + zfsRemoveSnapshots = s.volume.Config["zfs.remove_snapshots"] + } + + if !shared.IsTrue(zfsRemoveSnapshots) { + return fmt.Errorf("ZFS can only restore from the latest snapshot. Delete newer snapshots or copy the snapshot into a new container instead") + } + } + + return f() +} + +// TODO: Get the snapshots using zfs tools. +// Return list of removed snapshots since drivers cannot actually delete +// containers. Storage should be able to remove the container even if the +// zfs volumes/snapshots have been deleted beforehand. +// TODO: (stgraber) Pass function instead of having separate PrepareVolumeRestore function. +/// XXX: Deprecated. Use VolumeSnapshotRestore instead. +func (s *Zfs) VolumeRestore(project string, sourceName string, targetName string, volumeType VolumeType) error { + return s.VolumeSnapshotRestore(project, sourceName, targetName, volumeType) +} + +func (s *Zfs) VolumeSnapshotCreate(project string, source string, target string, volumeType VolumeType) error { + var dataset string + var mountpoint string + + isSnapshot := shared.IsSnapshot(target) + + if !isSnapshot { + return fmt.Errorf("Target volume is not a snapshot") + } + + volumeName, snapshotOnlyName, _ := containerGetParentAndSnapshotName(target) + snapName := fmt.Sprintf("snapshot-%s", snapshotOnlyName) + + switch volumeType { + case VolumeTypeContainerSnapshot: + dataset = fmt.Sprintf("containers/%s", projectPrefix(project, volumeName)) + case VolumeTypeCustomSnapshot: + dataset = fmt.Sprintf("custom/%s", volumeName) + mountpoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + if source == "" { + // Nothing to do + return nil + } + + err := ZfsPoolVolumeSnapshotCreate(s.getOnDiskPoolName(), dataset, snapName) + if err != nil { + return err + } + + if mountpoint != "" && !shared.PathExists(mountpoint) { + err := os.MkdirAll(mountpoint, 0700) + if err != nil { + return err + } + } + + return nil +} + +func (s *Zfs) VolumeSnapshotCopy(project, source string, target string, volumeType VolumeType) error { + switch volumeType { + case VolumeTypeContainerSnapshot: + return s.doContainerCopy(project, source, target) + case VolumeTypeCustomSnapshot: + return s.doVolumeCopy(source, target) + } + + return fmt.Errorf("Unsupported volume type: %v", volumeType) +} + +func (s *Zfs) VolumeSnapshotDelete(project string, volumeName string, recursive bool, volumeType VolumeType) error { + switch volumeType { + case VolumeTypeContainerSnapshot: + case VolumeTypeCustomSnapshot: + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + return ZfsSnapshotDeleteInternal(project, s.pool.Name, volumeName, s.getOnDiskPoolName()) +} + +func (s *Zfs) VolumeSnapshotRestore(project string, sourceName string, targetName string, volumeType VolumeType) error { + var path string + + parentName, snapOnlyName, _ := containerGetParentAndSnapshotName(sourceName) + + switch volumeType { + case VolumeTypeContainerSnapshot: + path = fmt.Sprintf("containers/%s", projectPrefix(project, parentName)) + case VolumeTypeCustomSnapshot: + path = fmt.Sprintf("custom/%s", parentName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + // Restore the snapshot + snapName := fmt.Sprintf("snapshot-%s", snapOnlyName) + + return zfsPoolVolumeSnapshotRestore(s.getOnDiskPoolName(), path, snapName) +} + +func (s *Zfs) VolumeSnapshotRename(project string, oldName string, newName string, volumeType VolumeType) error { + var oldSnapshotMntPoint string + var newSnapshotMntPoint string + + switch volumeType { + case VolumeTypeContainerSnapshot: + oldSnapshotMntPoint = getSnapshotMountPoint(project, s.pool.Name, oldName) + newSnapshotMntPoint = getSnapshotMountPoint(project, s.pool.Name, newName) + case VolumeTypeCustomSnapshot: + oldSnapshotMntPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, oldName) + newSnapshotMntPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, newName) + default: + return fmt.Errorf("Unsupported volume type: %v", volumeType) + } + + parentName, oldSnapOnlyName, _ := containerGetParentAndSnapshotName(oldName) + newSnapOnlyName := shared.ExtractSnapshotName(newName) + + oldZfsDatasetName := fmt.Sprintf("snapshot-%s", oldSnapOnlyName) + newZfsDatasetName := fmt.Sprintf("snapshot-%s", newSnapOnlyName) + + if oldZfsDatasetName == newZfsDatasetName { + // Nothing to do + return nil + } + + err := zfsPoolVolumeSnapshotRename( + s.getOnDiskPoolName(), fmt.Sprintf("containers/%s", projectPrefix(project, parentName)), oldZfsDatasetName, newZfsDatasetName) + if err != nil { + return err + } + + revert := true + + defer func() { + if !revert { + return + } + + zfsPoolVolumeSnapshotRename(s.getOnDiskPoolName(), fmt.Sprintf("containers/%s", projectPrefix(project, parentName)), newZfsDatasetName, oldZfsDatasetName) + }() + + err = os.Rename(oldSnapshotMntPoint, newSnapshotMntPoint) + if err != nil { + return err + } + + if volumeType == VolumeTypeCustomSnapshot { + revert = false + return nil + } + + snapshotMntPointSymlinkTarget := getSnapshotMountPoint(project, s.pool.Name, parentName) + snapshotMntPointSymlink := containerPath(project, parentName, true) + + if !shared.PathExists(snapshotMntPointSymlink) { + err := os.Symlink(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) + if err != nil { + return err + } + } + + revert = false + return nil +} + +func (s *Zfs) doVolumeBackupCreate(path string, project string, source string, snapshots []string) error { + var snapshotsPath string + + // Handle snapshots + if len(snapshots) > 0 { + snapshotsPath = fmt.Sprintf("%s/snapshots", path) + + // Create the snapshot path + err := os.MkdirAll(snapshotsPath, 0711) + if err != nil { + return errors.Wrap(err, "Create snapshot path") + } + } + + for _, snap := range snapshots { + // Mount the snapshot to a usable path + fullSnapshotName := fmt.Sprintf("%s/%s", source, snap) + _, err := s.VolumeMount(project, fullSnapshotName, VolumeTypeContainerSnapshot) + if err != nil { + return errors.Wrap(err, "Mount snapshot") + } + + snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, fullSnapshotName) + target := fmt.Sprintf("%s/%s", snapshotsPath, snap) + + // Copy the snapshot + err = s.rsync(snapshotMntPoint, target) + s.VolumeUmount(project, fullSnapshotName, VolumeTypeContainerSnapshot) + if err != nil { + return errors.Wrap(err, "Copy snapshot") + } + } + + // Make a temporary copy of the container + containersPath := getContainerMountPoint("default", s.pool.Name, "") + + tmpContainerMntPoint, err := ioutil.TempDir(containersPath, source) + if err != nil { + return errors.Wrap(err, "Create temporary copy dir") + } + defer os.RemoveAll(tmpContainerMntPoint) + + err = os.Chmod(tmpContainerMntPoint, 0700) + if err != nil { + return errors.Wrap(err, "Change temporary mount point permissions") + } + + snapshotSuffix := uuid.NewRandom().String() + fs := fmt.Sprintf("containers/%s", projectPrefix(project, source)) + sourceZfsDatasetSnapshot := fmt.Sprintf("snapshot-%s", snapshotSuffix) + poolName := s.getOnDiskPoolName() + + err = ZfsPoolVolumeSnapshotCreate(poolName, fs, sourceZfsDatasetSnapshot) + if err != nil { + return err + } + defer ZfsPoolVolumeSnapshotDestroy(poolName, fs, sourceZfsDatasetSnapshot) + + targetZfsDataset := fmt.Sprintf("containers/%s", snapshotSuffix) + + err = zfsPoolVolumeClone(project, poolName, fs, sourceZfsDatasetSnapshot, targetZfsDataset, tmpContainerMntPoint) + if err != nil { + return errors.Wrap(err, "Clone volume") + } + defer zfsPoolVolumeDestroy(poolName, targetZfsDataset) + + // Mount the temporary copy + if !shared.IsMountPoint(tmpContainerMntPoint) { + err = ZfsMount(poolName, targetZfsDataset) + if err != nil { + return errors.Wrap(err, "Mount temporary copy") + } + defer ZfsUmount(poolName, targetZfsDataset, tmpContainerMntPoint) + } + + // Copy the container + containerPath := fmt.Sprintf("%s/container", path) + + err = s.rsync(tmpContainerMntPoint, containerPath) + if err != nil { + return errors.Wrap(err, "Copy container") + } + + return nil +} + +func (s *Zfs) doVolumeBackupCreateOptimized(path string, project string, source string) error { + poolName := s.getOnDiskPoolName() + + snapshotSuffix := uuid.NewRandom().String() + sourceDataset := fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(project, source), snapshotSuffix) + fs := fmt.Sprintf("containers/%s", projectPrefix(project, source)) + + err := ZfsPoolVolumeSnapshotCreate(poolName, fs, snapshotSuffix) + if err != nil { + return err + } + defer ZfsPoolVolumeSnapshotDestroy(poolName, fs, snapshotSuffix) + + // Dump the container to a file + backupFile := fmt.Sprintf("%s/%s", path, "container.bin") + + f, err := os.OpenFile(backupFile, os.O_RDWR|os.O_CREATE, 0644) + if err != nil { + return err + } + defer f.Close() + + zfsSendCmd := exec.Command("zfs", "send", sourceDataset) + zfsSendCmd.Stdout = f + + return zfsSendCmd.Run() +} + +func (s *Zfs) doSnapshotBackup(path string, project string, source string, parentSnapshot string) error { + snapshotsPath := fmt.Sprintf("%s/snapshots", path) + + // Create backup path for snapshots + err := os.MkdirAll(snapshotsPath, 0711) + if err != nil { + return err + } + + currentSnapshotDataset := s.getFullDataset(project, source, "containers") + args := []string{"send", currentSnapshotDataset} + if parentSnapshot != "" { + parentSnapshotDataset := s.getFullDataset(project, parentSnapshot, "containers") + args = append(args, "-i", parentSnapshotDataset) + } + + backupFile := fmt.Sprintf("%s/%s.bin", snapshotsPath, shared.ExtractSnapshotName(source)) + f, err := os.OpenFile(backupFile, os.O_RDWR|os.O_CREATE, 0644) + if err != nil { + return err + } + defer f.Close() + + zfsSendCmd := exec.Command("zfs", args...) + zfsSendCmd.Stdout = f + return zfsSendCmd.Run() +} + +func (s *Zfs) doVolumeBackupCreateOptimizedWithSnapshots(path string, project string, source string, snapshots []string) error { + prev := "" + prevSnapOnlyName := "" + + for i, snap := range snapshots { + if i > 0 { + prev = fmt.Sprintf("%s/%s", source, snapshots[i-1]) + } + + prevSnapOnlyName = snap + + err := s.doSnapshotBackup(path, project, fmt.Sprintf("%s/%s", source, snap), prev) + if err != nil { + return err + } + } + + // Dump the container to a file + poolName := s.getOnDiskPoolName() + tmpSnapshotName := fmt.Sprintf("backup-%s", uuid.NewRandom().String()) + + err := ZfsPoolVolumeSnapshotCreate(poolName, fmt.Sprintf("containers/%s", projectPrefix(project, source)), tmpSnapshotName) + if err != nil { + return err + } + + currentSnapshotDataset := fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(project, source), tmpSnapshotName) + parentSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(project, source), prevSnapOnlyName) + args := []string{"send", currentSnapshotDataset, "-i", parentSnapshotDataset} + + backupFile := fmt.Sprintf("%s/container.bin", path) + + f, err := os.OpenFile(backupFile, os.O_RDWR|os.O_CREATE, 0644) + if err != nil { + return err + } + defer f.Close() + + zfsSendCmd := exec.Command("zfs", args...) + zfsSendCmd.Stdout = f + + err = zfsSendCmd.Run() + if err != nil { + return err + } + + return ZfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(project, source)), tmpSnapshotName) +} + +func (s *Zfs) VolumeBackupCreate(path string, project string, source string, snapshots []string, optimized bool) error { + if optimized { + if len(snapshots) == 0 { + return s.doVolumeBackupCreateOptimized(path, project, source) + } + + return s.doVolumeBackupCreateOptimizedWithSnapshots(path, project, source, snapshots) + } + + return s.doVolumeBackupCreate(path, project, source, snapshots) +} + +func (s *Zfs) doVolumeBackupLoadOptimized(backupDir string, project string, containerName string, snapshots []string) error { + poolName := s.getOnDiskPoolName() + unpackPath := filepath.Join(backupDir, ".backup_unpack") + + for _, snapshotOnlyName := range snapshots { + snapshotBackup := fmt.Sprintf("%s/snapshots/%s.bin", unpackPath, snapshotOnlyName) + + feeder, err := os.Open(snapshotBackup) + if err != nil { + // can't use defer because it needs to run before the mount + os.RemoveAll(unpackPath) + return err + } + + fullSnapshotName := fmt.Sprintf("%s/%s", containerName, snapshotOnlyName) + snapshotDataset := s.getFullDataset(project, fullSnapshotName, "containers") + + zfsRecvCmd := exec.Command("zfs", "receive", "-F", snapshotDataset) + zfsRecvCmd.Stdin = feeder + + err = zfsRecvCmd.Run() + feeder.Close() + if err != nil { + // can't use defer because it needs to run before the mount + os.RemoveAll(unpackPath) + return err + } + + // create mountpoint + snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, fullSnapshotName) + snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(project, containerName)) + + err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPoint, snapshotMntPointSymlink) + if err != nil { + // can't use defer because it needs to run before the mount + os.RemoveAll(unpackPath) + return err + } + } + + containerBackup := fmt.Sprintf("%s/container.bin", unpackPath) + + feeder, err := os.Open(containerBackup) + if err != nil { + // can't use defer because it needs to run before the mount + os.RemoveAll(unpackPath) + return err + } + defer feeder.Close() + + containerSnapshotDataset := fmt.Sprintf("%s/containers/%s at backup", poolName, projectPrefix(project, containerName)) + zfsRecvCmd := exec.Command("zfs", "receive", "-F", containerSnapshotDataset) + zfsRecvCmd.Stdin = feeder + + err = zfsRecvCmd.Run() + os.RemoveAll(backupDir) + ZfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(project, containerName)), "backup") + if err != nil { + return err + } + + fs := fmt.Sprintf("containers/%s", projectPrefix(project, containerName)) + err = zfsPoolVolumeSet(poolName, fs, "canmount", "noauto") + if err != nil { + return err + } + + containerMntPoint := getContainerMountPoint(project, s.pool.Name, containerName) + + err = zfsPoolVolumeSet(poolName, fs, "mountpoint", containerMntPoint) + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) VolumeBackupLoad(backupDir string, project string, containerName string, snapshots []string, privileged bool, optimized bool) error { + if optimized { + return s.doVolumeBackupLoadOptimized(backupDir, project, containerName, snapshots) + } + + // Non-optimized container backup is handled in the storage code. + return nil +} + +func (s *Zfs) VolumeReady(project string, name string) bool { + volumeName := projectPrefix(project, name) + fs := fmt.Sprintf("containers/%s", volumeName) + return ZfsFilesystemEntityExists(s.getOnDiskPoolName(), fs) +} + +func (s *Zfs) getOnDiskPoolName() string { + if s.dataset != "" { + return s.dataset + } + + return s.pool.Name +} + +// zfsPoolVolumeCreate creates a ZFS dataset with a set of given properties. +func zfsPoolVolumeCreate(dataset string, properties ...string) (string, error) { + cmd := []string{"zfs", "create"} + + for _, prop := range properties { + cmd = append(cmd, []string{"-o", prop}...) + } + + cmd = append(cmd, []string{"-p", dataset}...) + + return shared.RunCommand(cmd[0], cmd[1:]...) +} + +func zfsPoolVolumeRename(pool string, source string, dest string, ignoreMounts bool) error { + var err error + var output string + + for i := 0; i < 20; i++ { + if ignoreMounts { + output, err = shared.RunCommand( + "/proc/self/exe", + "forkzfs", + "--", + "rename", + "-p", + fmt.Sprintf("%s/%s", pool, source), + fmt.Sprintf("%s/%s", pool, dest)) + } else { + output, err = shared.RunCommand( + "zfs", + "rename", + "-p", + fmt.Sprintf("%s/%s", pool, source), + fmt.Sprintf("%s/%s", pool, dest)) + } + + // Success + if err == nil { + return nil + } + + // zfs rename can fail because of descendants, yet still manage the rename + if !ZfsFilesystemEntityExists(pool, source) && ZfsFilesystemEntityExists(pool, dest) { + return nil + } + + time.Sleep(500 * time.Millisecond) + } + + // Timeout + logger.Errorf("zfs rename failed: %s", output) + return fmt.Errorf("Failed to rename ZFS filesystem: %s", output) +} + +func zfsPoolVolumeSet(pool string, path string, key string, value string) error { + vdev := pool + if path != "" { + vdev = fmt.Sprintf("%s/%s", pool, path) + } + output, err := shared.RunCommand( + "zfs", + "set", + fmt.Sprintf("%s=%s", key, value), + vdev) + if err != nil { + logger.Errorf("zfs set failed: %s", output) + return fmt.Errorf("Failed to set ZFS config: %s", output) + } + + return nil +} + +func ZfsFilesystemEntityExists(pool string, path string) bool { + vdev := pool + if path != "" { + vdev = fmt.Sprintf("%s/%s", pool, path) + } + output, err := shared.RunCommand( + "zfs", + "get", + "-H", + "-o", + "name", + "type", + vdev) + if err != nil { + return false + } + + detectedName := strings.TrimSpace(output) + return detectedName == vdev +} + +func (s *Zfs) zfsPoolCreate() error { + s.pool.Config["volatile.initial_source"] = s.pool.Config["source"] + + zpoolName := s.getOnDiskPoolName() + vdev := s.pool.Config["source"] + logger.Debugf("vdev=%s", vdev) + defaultVdev := filepath.Join(shared.VarPath("disks"), fmt.Sprintf("%s.img", s.pool.Name)) + + if vdev == "" || vdev == defaultVdev { + vdev = defaultVdev + s.pool.Config["source"] = vdev + + if s.pool.Config["zfs.pool_name"] == "" { + s.pool.Config["zfs.pool_name"] = zpoolName + } + + f, err := os.Create(vdev) + if err != nil { + return fmt.Errorf("Failed to open %s: %s", vdev, err) + } + defer f.Close() + + err = f.Chmod(0600) + if err != nil { + return fmt.Errorf("Failed to chmod %s: %s", vdev, err) + } + + size, err := shared.ParseByteSizeString(s.pool.Config["size"]) + if err != nil { + return err + } + err = f.Truncate(size) + if err != nil { + return fmt.Errorf("Failed to create sparse file %s: %s", vdev, err) + } + + err = zfsPoolCreate(zpoolName, vdev) + if err != nil { + return err + } + } else { + // Unset size property since it doesn't make sense. + s.pool.Config["size"] = "" + + if filepath.IsAbs(vdev) { + if !shared.IsBlockdevPath(vdev) { + return fmt.Errorf("Custom loop file locations are not supported") + } + + if s.pool.Config["zfs.pool_name"] == "" { + s.pool.Config["zfs.pool_name"] = zpoolName + } + + // This is a block device. Note, that we do not store the + // block device path or UUID or PARTUUID or similar in + // the database. All of those might change or might be + // used in a special way (For example, zfs uses a single + // UUID in a multi-device pool for all devices.). The + // safest way is to just store the name of the zfs pool + // we create. + s.pool.Config["source"] = zpoolName + err := zfsPoolCreate(zpoolName, vdev) + if err != nil { + return err + } + } else { + if s.pool.Config["zfs.pool_name"] != "" && s.pool.Config["zfs.pool_name"] != vdev { + return fmt.Errorf("Invalid combination of \"source\" and \"zfs.pool_name\" property") + } + + s.pool.Config["zfs.pool_name"] = vdev + s.dataset = vdev + + if strings.Contains(vdev, "/") { + if !ZfsFilesystemEntityExists(vdev, "") { + err := zfsPoolCreate("", vdev) + if err != nil { + return err + } + } + } else { + err := zfsPoolCheck(vdev) + if err != nil { + return err + } + } + + subvols, err := zfsPoolListSubvolumes(zpoolName, vdev) + if err != nil { + return err + } + + if len(subvols) > 0 { + return fmt.Errorf("Provided ZFS pool (or dataset) isn't empty") + } + + err = zfsPoolApplyDefaults(vdev) + if err != nil { + return err + } + } + } + + // Create default dummy datasets to avoid zfs races during container + // creation. + poolName := s.getOnDiskPoolName() + dataset := fmt.Sprintf("%s/containers", poolName) + msg, err := zfsPoolVolumeCreate(dataset, "mountpoint=none") + if err != nil { + logger.Errorf("Failed to create containers dataset: %s", msg) + return err + } + + fixperms := shared.VarPath("storage-pools", s.pool.Name, "containers") + err = os.MkdirAll(fixperms, containersDirMode) + if err != nil && !os.IsNotExist(err) { + return err + } + + err = os.Chmod(fixperms, containersDirMode) + if err != nil { + logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(containersDirMode), 8), err) + } + + dataset = fmt.Sprintf("%s/images", poolName) + msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") + if err != nil { + logger.Errorf("Failed to create images dataset: %s", msg) + return err + } + + fixperms = shared.VarPath("storage-pools", s.pool.Name, "images") + err = os.MkdirAll(fixperms, imagesDirMode) + if err != nil && !os.IsNotExist(err) { + return err + } + err = os.Chmod(fixperms, imagesDirMode) + if err != nil { + logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(imagesDirMode), 8), err) + } + + dataset = fmt.Sprintf("%s/custom", poolName) + msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") + if err != nil { + logger.Errorf("Failed to create custom dataset: %s", msg) + return err + } + + fixperms = shared.VarPath("storage-pools", s.pool.Name, "custom") + err = os.MkdirAll(fixperms, customDirMode) + if err != nil && !os.IsNotExist(err) { + return err + } + err = os.Chmod(fixperms, customDirMode) + if err != nil { + logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(customDirMode), 8), err) + } + + dataset = fmt.Sprintf("%s/deleted", poolName) + msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") + if err != nil { + logger.Errorf("Failed to create deleted dataset: %s", msg) + return err + } + + dataset = fmt.Sprintf("%s/snapshots", poolName) + msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") + if err != nil { + logger.Errorf("Failed to create snapshots dataset: %s", msg) + return err + } + + fixperms = shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots") + err = os.MkdirAll(fixperms, snapshotsDirMode) + if err != nil && !os.IsNotExist(err) { + return err + } + err = os.Chmod(fixperms, snapshotsDirMode) + if err != nil { + logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(snapshotsDirMode), 8), err) + } + + dataset = fmt.Sprintf("%s/custom-snapshots", poolName) + msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") + if err != nil { + logger.Errorf("Failed to create snapshots dataset: %s", msg) + return err + } + + fixperms = shared.VarPath("storage-pools", s.pool.Name, "custom-snapshots") + err = os.MkdirAll(fixperms, snapshotsDirMode) + if err != nil && !os.IsNotExist(err) { + return err + } + err = os.Chmod(fixperms, snapshotsDirMode) + if err != nil { + logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(snapshotsDirMode), 8), err) + } + + return nil +} + +func zfsPoolCreate(pool string, vdev string) error { + var output string + var err error + + dataset := "" + + if pool == "" { + output, err := shared.RunCommand( + "zfs", "create", "-p", "-o", "mountpoint=none", vdev) + if err != nil { + logger.Errorf("zfs create failed: %s", output) + return fmt.Errorf("Failed to create ZFS filesystem: %s", output) + } + dataset = vdev + } else { + output, err = shared.RunCommand( + "zpool", "create", "-f", "-m", "none", "-O", "compression=on", pool, vdev) + if err != nil { + logger.Errorf("zfs create failed: %s", output) + return fmt.Errorf("Failed to create the ZFS pool: %s", output) + } + + dataset = pool + } + + err = zfsPoolApplyDefaults(dataset) + if err != nil { + return err + } + + return nil +} + +func (s *Zfs) doCopyVolumeWithoutSnapshotsFull(source string) error { + sourceIsSnapshot := strings.Contains(source, "/") + + var snapshotSuffix string + var sourceDataset string + var targetDataset string + var targetSnapshotDataset string + + poolName := s.getOnDiskPoolName() + + if sourceIsSnapshot { + sourceVolumeName, sourceSnapOnlyName, _ := containerGetParentAndSnapshotName(source) + snapshotSuffix = fmt.Sprintf("snapshot-%s", sourceSnapOnlyName) + sourceDataset = fmt.Sprintf("%s/custom/%s@%s", poolName, sourceVolumeName, snapshotSuffix) + targetSnapshotDataset = fmt.Sprintf("%s/custom/%s at snapshot-%s", poolName, s.volume.Name, sourceSnapOnlyName) + } else { + snapshotSuffix = uuid.NewRandom().String() + sourceDataset = fmt.Sprintf("%s/custom/%s@%s", poolName, source, snapshotSuffix) + targetSnapshotDataset = fmt.Sprintf("%s/custom/%s@%s", poolName, s.volume.Name, snapshotSuffix) + + fs := fmt.Sprintf("custom/%s", source) + err := ZfsPoolVolumeSnapshotCreate(poolName, fs, snapshotSuffix) + if err != nil { + return err + } + defer func() { + err := ZfsPoolVolumeSnapshotDestroy(poolName, fs, snapshotSuffix) + if err != nil { + logger.Warnf("Failed to delete temporary ZFS snapshot \"%s\", manual cleanup needed", sourceDataset) + } + }() + } + + zfsSendCmd := exec.Command("zfs", "send", sourceDataset) + + zfsRecvCmd := exec.Command("zfs", "receive", targetDataset) + + zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() + zfsRecvCmd.Stdout = os.Stdout + zfsRecvCmd.Stderr = os.Stderr + + err := zfsRecvCmd.Start() + if err != nil { + return err + } + + err = zfsSendCmd.Run() + if err != nil { + return err + } + + err = zfsRecvCmd.Wait() + if err != nil { + return err + } + + msg, err := shared.RunCommand("zfs", "rollback", "-r", "-R", targetSnapshotDataset) + if err != nil { + logger.Errorf("Failed to rollback ZFS dataset: %s", msg) + return err + } + + targetContainerMountPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) + targetfs := fmt.Sprintf("custom/%s", s.volume.Name) + + err = zfsPoolVolumeSet(poolName, targetfs, "canmount", "noauto") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(poolName, targetfs, "mountpoint", targetContainerMountPoint) + if err != nil { + return err + } + + err = ZfsPoolVolumeSnapshotDestroy(poolName, targetfs, snapshotSuffix) + if err != nil { + return err + } + + return nil +} + +// TODO: +func (s *Zfs) doCopyVolumeWithoutSnapshotsSparse(source string) error { + poolName := s.getOnDiskPoolName() + + sourceVolumeName := source + sourceVolumePath := getStoragePoolVolumeMountPoint(s.pool.Name, source) + + targetVolumeName := s.volume.Name + targetVolumePath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) + + sourceZfsDataset := "" + sourceZfsDatasetSnapshot := "" + sourceName, sourceSnapOnlyName, isSnapshotName := containerGetParentAndSnapshotName(sourceVolumeName) + + targetZfsDataset := fmt.Sprintf("custom/%s", targetVolumeName) + + if isSnapshotName { + sourceZfsDatasetSnapshot = sourceSnapOnlyName + } + + revert := true + if sourceZfsDatasetSnapshot == "" { + if ZfsFilesystemEntityExists(poolName, fmt.Sprintf("custom/%s", sourceName)) { + sourceZfsDatasetSnapshot = fmt.Sprintf("copy-%s", uuid.NewRandom().String()) + sourceZfsDataset = fmt.Sprintf("custom/%s", sourceName) + + err := ZfsPoolVolumeSnapshotCreate(poolName, sourceZfsDataset, sourceZfsDatasetSnapshot) + if err != nil { + return err + } + + defer func() { + if !revert { + return + } + ZfsPoolVolumeSnapshotDestroy(poolName, sourceZfsDataset, sourceZfsDatasetSnapshot) + }() + } + } else { + if ZfsFilesystemEntityExists(poolName, fmt.Sprintf("custom/%s at snapshot-%s", sourceName, sourceZfsDatasetSnapshot)) { + sourceZfsDataset = fmt.Sprintf("custom/%s", sourceName) + sourceZfsDatasetSnapshot = fmt.Sprintf("snapshot-%s", sourceZfsDatasetSnapshot) + } + } + + if sourceZfsDataset != "" { + err := zfsPoolVolumeClone("default", poolName, sourceZfsDataset, sourceZfsDatasetSnapshot, targetZfsDataset, targetVolumePath) + if err != nil { + return err + } + + defer func() { + if !revert { + return + } + zfsPoolVolumeDestroy(poolName, targetZfsDataset) + }() + } else { + err := s.rsync(sourceVolumePath, targetVolumePath) + if err != nil { + return err + } + } + + revert = false + + return nil +} + +func zfsPoolCheck(pool string) error { + output, err := shared.RunCommand( + "zfs", "get", "-H", "-o", "value", "type", pool) + if err != nil { + return fmt.Errorf(strings.Split(output, "\n")[0]) + } + + poolType := strings.Split(output, "\n")[0] + if poolType != "filesystem" { + return fmt.Errorf("Unsupported pool type: %s", poolType) + } + + return nil +} + +func zfsPoolListSubvolumes(pool string, path string) ([]string, error) { + output, err := shared.RunCommand( + "zfs", + "list", + "-t", "filesystem", + "-o", "name", + "-H", + "-r", path) + if err != nil { + logger.Errorf("zfs list failed: %s", output) + return []string{}, fmt.Errorf("Failed to list ZFS filesystems: %s", output) + } + + children := []string{} + for _, entry := range strings.Split(output, "\n") { + if entry == "" { + continue + } + + if entry == path { + continue + } + + children = append(children, strings.TrimPrefix(entry, fmt.Sprintf("%s/", pool))) + } + + return children, nil +} + +func zfsPoolApplyDefaults(dataset string) error { + err := zfsPoolVolumeSet(dataset, "", "mountpoint", "none") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(dataset, "", "setuid", "on") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(dataset, "", "exec", "on") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(dataset, "", "devices", "on") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(dataset, "", "acltype", "posixacl") + if err != nil { + return err + } + + err = zfsPoolVolumeSet(dataset, "", "xattr", "sa") + if err != nil { + return err + } + + return nil +} + +func zfsFilesystemEntityDelete(vdev string, pool string) error { + var output string + var err error + if strings.Contains(pool, "/") { + // Command to destroy a zfs dataset. + output, err = shared.RunCommand("zfs", "destroy", "-r", pool) + } else { + // Command to destroy a zfs pool. + output, err = shared.RunCommand("zpool", "destroy", "-f", pool) + } + if err != nil { + return fmt.Errorf("Failed to delete the ZFS pool: %s", output) + } + + // Cleanup storage + if filepath.IsAbs(vdev) && !shared.IsBlockdevPath(vdev) { + os.RemoveAll(vdev) + } + + return nil +} + +func ZfsPoolListSnapshots(pool string, path string) ([]string, error) { + path = strings.TrimRight(path, "/") + fullPath := pool + if path != "" { + fullPath = fmt.Sprintf("%s/%s", pool, path) + } + + output, err := shared.RunCommand( + "zfs", + "list", + "-t", "snapshot", + "-o", "name", + "-H", + "-d", "1", + "-s", "creation", + "-r", fullPath) + if err != nil { + logger.Errorf("zfs list failed: %s", output) + return []string{}, fmt.Errorf("Failed to list ZFS snapshots: %s", output) + } + + children := []string{} + for _, entry := range strings.Split(output, "\n") { + if entry == "" { + continue + } + + if entry == fullPath { + continue + } + + children = append(children, strings.SplitN(entry, "@", 2)[1]) + } + + return children, nil +} + +func zfsPoolVolumeSnapshotRemovable(pool string, path string, name string) (bool, error) { + var snap string + if name == "" { + snap = path + } else { + snap = fmt.Sprintf("%s@%s", path, name) + } + + clones, err := zfsFilesystemEntityPropertyGet(pool, snap, "clones") + if err != nil { + return false, err + } + + if clones == "-" || clones == "" { + return true, nil + } + + return false, nil +} + +func zfsFilesystemEntityPropertyGet(pool string, path string, key string) (string, error) { + entity := pool + if path != "" { + entity = fmt.Sprintf("%s/%s", pool, path) + } + output, err := shared.RunCommand( + "zfs", + "get", + "-H", + "-p", + "-o", "value", + key, + entity) + if err != nil { + return "", fmt.Errorf("Failed to get ZFS config: %s", output) + } + + return strings.TrimRight(output, "\n"), nil +} + +func zfsPoolVolumeDestroy(pool string, path string) error { + mountpoint, err := zfsFilesystemEntityPropertyGet(pool, path, "mountpoint") + if err != nil { + return err + } + + if mountpoint != "none" && shared.IsMountPoint(mountpoint) { + err := syscall.Unmount(mountpoint, syscall.MNT_DETACH) + if err != nil { + logger.Errorf("umount failed: %s", err) + return err + } + } + + // Due to open fds or kernel refs, this may fail for a bit, give it 10s + output, err := shared.TryRunCommand( + "zfs", + "destroy", + "-r", + fmt.Sprintf("%s/%s", pool, path)) + + if err != nil { + logger.Errorf("zfs destroy failed: %s", output) + return fmt.Errorf("Failed to destroy ZFS filesystem: %s", output) + } + + return nil +} + +func zfsPoolVolumeCleanup(pool string, path string) error { + if strings.HasPrefix(path, "deleted/") { + // Cleanup of filesystems kept for refcount reason + removablePath, err := zfsPoolVolumeSnapshotRemovable(pool, path, "") + if err != nil { + return err + } + + // Confirm that there are no more clones + if removablePath { + if strings.Contains(path, "@") { + // Cleanup snapshots + err = zfsPoolVolumeDestroy(pool, path) + if err != nil { + return err + } + + // Check if the parent can now be deleted + subPath := strings.SplitN(path, "@", 2)[0] + snaps, err := ZfsPoolListSnapshots(pool, subPath) + if err != nil { + return err + } + + if len(snaps) == 0 { + err := zfsPoolVolumeCleanup(pool, subPath) + if err != nil { + return err + } + } + } else { + // Cleanup filesystems + origin, err := zfsFilesystemEntityPropertyGet(pool, path, "origin") + if err != nil { + return err + } + origin = strings.TrimPrefix(origin, fmt.Sprintf("%s/", pool)) + + err = zfsPoolVolumeDestroy(pool, path) + if err != nil { + return err + } + + // Attempt to remove its parent + if origin != "-" { + err := zfsPoolVolumeCleanup(pool, origin) + if err != nil { + return err + } + } + } + + return nil + } + } else if strings.HasPrefix(path, "containers") && strings.Contains(path, "@copy-") { + // Just remove the copy- snapshot for copies of active containers + err := zfsPoolVolumeDestroy(pool, path) + if err != nil { + return err + } + } + + return nil +} + +func ZfsMount(poolName string, path string) error { + output, err := shared.TryRunCommand( + "zfs", + "mount", + fmt.Sprintf("%s/%s", poolName, path)) + if err != nil { + return fmt.Errorf("Failed to mount ZFS filesystem: %s", output) + } + + return nil +} + +func ZfsUmount(poolName string, path string, mountpoint string) error { + output, err := shared.TryRunCommand( + "zfs", + "unmount", + fmt.Sprintf("%s/%s", poolName, path)) + if err != nil { + logger.Warnf("Failed to unmount ZFS filesystem via zfs unmount: %s. Trying lazy umount (MNT_DETACH)...", output) + err := tryUnmount(mountpoint, syscall.MNT_DETACH) + if err != nil { + logger.Warnf("Failed to unmount ZFS filesystem via lazy umount (MNT_DETACH)...") + return err + } + } + + return nil +} + +func zfsPoolVolumeSnapshotRestore(pool string, path string, name string) error { + output, err := shared.TryRunCommand( + "zfs", + "rollback", + fmt.Sprintf("%s/%s@%s", pool, path, name)) + if err != nil { + logger.Errorf("zfs rollback failed: %s", output) + return fmt.Errorf("Failed to restore ZFS snapshot: %s", output) + } + + subvols, err := zfsPoolListSubvolumes(pool, fmt.Sprintf("%s/%s", pool, path)) + if err != nil { + return err + } + + for _, sub := range subvols { + snaps, err := ZfsPoolListSnapshots(pool, sub) + if err != nil { + return err + } + + if !shared.StringInSlice(name, snaps) { + continue + } + + output, err := shared.TryRunCommand( + "zfs", + "rollback", + fmt.Sprintf("%s/%s@%s", pool, sub, name)) + if err != nil { + logger.Errorf("zfs rollback failed: %s", output) + return fmt.Errorf("Failed to restore ZFS sub-volume snapshot: %s", output) + } + } + + return nil +} + +func ZfsPoolVolumeSnapshotCreate(pool string, path string, name string) error { + output, err := shared.RunCommand( + "zfs", + "snapshot", + "-r", + fmt.Sprintf("%s/%s@%s", pool, path, name)) + if err != nil { + logger.Errorf("zfs snapshot failed: %s", output) + return fmt.Errorf("Failed to create ZFS snapshot: %s", output) + } + + return nil +} + +func ZfsSnapshotDeleteInternal(project, poolName string, ctName string, onDiskPoolName string) error { + sourceContainerName, sourceContainerSnapOnlyName, _ := containerGetParentAndSnapshotName(ctName) + snapName := fmt.Sprintf("snapshot-%s", sourceContainerSnapOnlyName) + + if ZfsFilesystemEntityExists(onDiskPoolName, + fmt.Sprintf("containers/%s@%s", + projectPrefix(project, sourceContainerName), snapName)) { + removable, err := zfsPoolVolumeSnapshotRemovable(onDiskPoolName, + fmt.Sprintf("containers/%s", + projectPrefix(project, sourceContainerName)), + snapName) + if err != nil { + return err + } + + if removable { + err = ZfsPoolVolumeSnapshotDestroy(onDiskPoolName, + fmt.Sprintf("containers/%s", + projectPrefix(project, sourceContainerName)), + snapName) + } else { + err = zfsPoolVolumeSnapshotRename(onDiskPoolName, + fmt.Sprintf("containers/%s", + projectPrefix(project, sourceContainerName)), + snapName, + fmt.Sprintf("copy-%s", uuid.NewRandom().String())) + } + if err != nil { + return err + } + } + + // Delete the snapshot on its storage pool: + // ${POOL}/snapshots/ + snapshotContainerMntPoint := getSnapshotMountPoint(project, poolName, ctName) + if shared.PathExists(snapshotContainerMntPoint) { + err := os.RemoveAll(snapshotContainerMntPoint) + if err != nil { + return err + } + } + + // Check if we can remove the snapshot symlink: + // ${LXD_DIR}/snapshots/ to ${POOL}/snapshots/ + // by checking if the directory is empty. + snapshotContainerPath := getSnapshotMountPoint(project, poolName, sourceContainerName) + empty, _ := shared.PathIsEmpty(snapshotContainerPath) + if empty == true { + // Remove the snapshot directory for the container: + // ${POOL}/snapshots/ + err := os.Remove(snapshotContainerPath) + if err != nil { + return err + } + + snapshotSymlink := shared.VarPath("snapshots", projectPrefix(project, sourceContainerName)) + if shared.PathExists(snapshotSymlink) { + err := os.Remove(snapshotSymlink) + if err != nil { + return err + } + } + } + + // Legacy + snapPath := shared.VarPath(fmt.Sprintf("snapshots/%s/%s.zfs", projectPrefix(project, sourceContainerName), sourceContainerSnapOnlyName)) + if shared.PathExists(snapPath) { + err := os.Remove(snapPath) + if err != nil { + return err + } + } + + // Legacy + parent := shared.VarPath(fmt.Sprintf("snapshots/%s", projectPrefix(project, sourceContainerName))) + if ok, _ := shared.PathIsEmpty(parent); ok { + err := os.Remove(parent) + if err != nil { + return err + } + } + + return nil +} + +func ZfsPoolVolumeSnapshotDestroy(pool, path string, name string) error { + output, err := shared.RunCommand( + "zfs", + "destroy", + "-r", + fmt.Sprintf("%s/%s@%s", pool, path, name)) + if err != nil { + logger.Errorf("zfs destroy failed: %s", output) + return fmt.Errorf("Failed to destroy ZFS snapshot: %s", output) + } + + return nil +} + +func zfsPoolVolumeSnapshotRename(pool string, path string, oldName string, newName string) error { + output, err := shared.RunCommand( + "zfs", + "rename", + "-r", + fmt.Sprintf("%s/%s@%s", pool, path, oldName), + fmt.Sprintf("%s/%s@%s", pool, path, newName)) + if err != nil { + logger.Errorf("zfs snapshot rename failed: %s", output) + return fmt.Errorf("Failed to rename ZFS snapshot: %s", output) + } + + return nil +} + +func zfsPoolVolumeClone(project, pool string, source string, name string, dest string, mountpoint string) error { + output, err := shared.RunCommand( + "zfs", + "clone", + "-p", + "-o", fmt.Sprintf("mountpoint=%s", mountpoint), + "-o", "canmount=noauto", + fmt.Sprintf("%s/%s@%s", pool, source, name), + fmt.Sprintf("%s/%s", pool, dest)) + if err != nil { + logger.Errorf("zfs clone failed: %s", output) + return fmt.Errorf("Failed to clone the filesystem: %s", output) + } + + subvols, err := zfsPoolListSubvolumes(pool, fmt.Sprintf("%s/%s", pool, source)) + if err != nil { + return err + } + + for _, sub := range subvols { + snaps, err := ZfsPoolListSnapshots(pool, sub) + if err != nil { + return err + } + + if !shared.StringInSlice(name, snaps) { + continue + } + + destSubvol := dest + strings.TrimPrefix(sub, source) + snapshotMntPoint := getSnapshotMountPoint(project, pool, destSubvol) + + output, err := shared.RunCommand( + "zfs", + "clone", + "-p", + "-o", fmt.Sprintf("mountpoint=%s", snapshotMntPoint), + "-o", "canmount=noauto", + fmt.Sprintf("%s/%s@%s", pool, sub, name), + fmt.Sprintf("%s/%s", pool, destSubvol)) + if err != nil { + logger.Errorf("zfs clone failed: %s", output) + return fmt.Errorf("Failed to clone the sub-volume: %s", output) + } + } + + return nil +} + +// zfsIsEnabled returns whether zfs backend is supported. +func zfsIsEnabled() bool { + out, err := exec.LookPath("zfs") + if err != nil || len(out) == 0 { + return false + } + + return true +} + +// zfsToolVersionGet returns the ZFS tools version +func zfsToolVersionGet() (string, error) { + // This function is only really ever relevant on Ubuntu as the only + // distro that ships out of sync tools and kernel modules + out, err := shared.RunCommand("dpkg-query", "--showformat=${Version}", "--show", "zfsutils-linux") + if err != nil { + return "", err + } + + return strings.TrimSpace(string(out)), nil +} + +// zfsModuleVersionGet returns the ZFS module version +func zfsModuleVersionGet() (string, error) { + var zfsVersion string + + if shared.PathExists("/sys/module/zfs/version") { + out, err := ioutil.ReadFile("/sys/module/zfs/version") + if err != nil { + return "", fmt.Errorf("Could not determine ZFS module version") + } + + zfsVersion = string(out) + } else { + out, err := shared.RunCommand("modinfo", "-F", "version", "zfs") + if err != nil { + return "", fmt.Errorf("Could not determine ZFS module version") + } + + zfsVersion = out + } + + return strings.TrimSpace(zfsVersion), nil +} + +// ZfsPoolVolumeExists verifies if a specific ZFS pool or volume exists. +func ZfsPoolVolumeExists(dataset string) (bool, error) { + output, err := shared.RunCommand( + "zfs", "list", "-Ho", "name") + + if err != nil { + return false, err + } + + for _, name := range strings.Split(output, "\n") { + if name == dataset { + return true, nil + } + } + return false, nil +} + +func ZfsIdmapSetSkipper(dir string, absPath string, fi os.FileInfo) bool { + strippedPath := absPath + if dir != "" { + strippedPath = absPath[len(dir):] + } + + if fi.IsDir() && strippedPath == "/.zfs/snapshot" { + return true + } + + return false +} diff --git a/lxd/storage_migration_zfs.go b/lxd/storage_migration_zfs.go new file mode 100644 index 0000000000..19d024cb5b --- /dev/null +++ b/lxd/storage_migration_zfs.go @@ -0,0 +1,372 @@ +package main + +import ( + "fmt" + "io" + "io/ioutil" + "os" + "os/exec" + "strings" + + "github.com/gorilla/websocket" + "github.com/pborman/uuid" + + "github.com/lxc/lxd/lxd/state" + driver "github.com/lxc/lxd/lxd/storage" + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" + "github.com/lxc/lxd/shared/logger" +) + +type zfsMigrationSourceDriver2 struct { + container container + snapshots []container + zfsSnapshotNames []string + runningSnapName string + stoppedSnapName string + zfsFeatures []string + onDiskPoolName string + pool *api.StoragePool + state *state.State +} + +func zfsMigrationSource(s *state.State, pool *api.StoragePool, args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { + onDiskPoolName := pool.Name + + if pool.Config["zfs.pool_name"] != "" { + onDiskPoolName = pool.Config["zfs.pool_name"] + } + + /* If the container is a snapshot, let's just send that; we don't need + * to send anything else, because that's all the user asked for. + */ + if args.Container.IsSnapshot() { + return &zfsMigrationSourceDriver2{container: args.Container, zfsFeatures: args.ZfsFeatures, pool: pool, onDiskPoolName: onDiskPoolName}, nil + } + + migrationDriver := zfsMigrationSourceDriver2{ + container: args.Container, + snapshots: []container{}, + zfsSnapshotNames: []string{}, + zfsFeatures: args.ZfsFeatures, + pool: pool, + onDiskPoolName: onDiskPoolName, + } + + if args.ContainerOnly { + return &migrationDriver, nil + } + + /* List all the snapshots in order of reverse creation. The idea here + * is that we send the oldest to newest snapshot, hopefully saving on + * xfer costs. Then, after all that, we send the container itself. + */ + snapshots, err := driver.ZfsPoolListSnapshots(migrationDriver.onDiskPoolName, fmt.Sprintf("containers/%s", projectPrefix(args.Container.Project(), args.Container.Name()))) + if err != nil { + return nil, err + } + + for _, snap := range snapshots { + /* In the case of e.g. multiple copies running at the same + * time, we will have potentially multiple migration-send + * snapshots. (Or in the case of the test suite, sometimes one + * will take too long to delete.) + */ + if !strings.HasPrefix(snap, "snapshot-") { + continue + } + + lxdName := fmt.Sprintf("%s%s%s", args.Container.Name(), shared.SnapshotDelimiter, snap[len("snapshot-"):]) + snapshot, err := containerLoadByProjectAndName(s, args.Container.Project(), lxdName) + if err != nil { + return nil, err + } + + migrationDriver.snapshots = append(migrationDriver.snapshots, snapshot) + migrationDriver.zfsSnapshotNames = append(migrationDriver.zfsSnapshotNames, snap) + } + + return &migrationDriver, nil +} + +func (s *zfsMigrationSourceDriver2) send(conn *websocket.Conn, zfsName string, zfsParent string, readWrapper func(io.ReadCloser) io.ReadCloser) error { + sourceParentName, _, _ := containerGetParentAndSnapshotName(s.container.Name()) + args := []string{"send"} + + // Negotiated options + if s.zfsFeatures != nil && len(s.zfsFeatures) > 0 { + if shared.StringInSlice("compress", s.zfsFeatures) { + args = append(args, "-c") + args = append(args, "-L") + } + } + + args = append(args, []string{fmt.Sprintf("%s/containers/%s@%s", s.onDiskPoolName, projectPrefix(s.container.Project(), sourceParentName), zfsName)}...) + if zfsParent != "" { + args = append(args, "-i", fmt.Sprintf("%s/containers/%s@%s", s.onDiskPoolName, projectPrefix(s.container.Project(), s.container.Name()), zfsParent)) + } + + cmd := exec.Command("zfs", args...) + + stdout, err := cmd.StdoutPipe() + if err != nil { + return err + } + + readPipe := io.ReadCloser(stdout) + if readWrapper != nil { + readPipe = readWrapper(stdout) + } + + stderr, err := cmd.StderrPipe() + if err != nil { + return err + } + + if err := cmd.Start(); err != nil { + return err + } + + <-shared.WebsocketSendStream(conn, readPipe, 4*1024*1024) + + output, err := ioutil.ReadAll(stderr) + if err != nil { + logger.Errorf("Problem reading zfs send stderr: %s", err) + } + + err = cmd.Wait() + if err != nil { + logger.Errorf("Problem with zfs send: %s", string(output)) + } + + return err +} + +func (s *zfsMigrationSourceDriver2) SendWhileRunning(conn *websocket.Conn, op *operation, bwlimit string, containerOnly bool) error { + if s.container.IsSnapshot() { + _, snapOnlyName, _ := containerGetParentAndSnapshotName(s.container.Name()) + snapshotName := fmt.Sprintf("snapshot-%s", snapOnlyName) + wrapper := StorageProgressReader(op, "fs_progress", s.container.Name()) + return s.send(conn, snapshotName, "", wrapper) + } + + lastSnap := "" + if !containerOnly { + for i, snap := range s.zfsSnapshotNames { + prev := "" + if i > 0 { + prev = s.zfsSnapshotNames[i-1] + } + + lastSnap = snap + + wrapper := StorageProgressReader(op, "fs_progress", snap) + if err := s.send(conn, snap, prev, wrapper); err != nil { + return err + } + } + } + + s.runningSnapName = fmt.Sprintf("migration-send-%s", uuid.NewRandom().String()) + if err := driver.ZfsPoolVolumeSnapshotCreate(s.onDiskPoolName, fmt.Sprintf("containers/%s", projectPrefix(s.container.Project(), s.container.Name())), s.runningSnapName); err != nil { + return err + } + + wrapper := StorageProgressReader(op, "fs_progress", s.container.Name()) + if err := s.send(conn, s.runningSnapName, lastSnap, wrapper); err != nil { + return err + } + + return nil +} + +func (s *zfsMigrationSourceDriver2) SendAfterCheckpoint(conn *websocket.Conn, bwlimit string) error { + s.stoppedSnapName = fmt.Sprintf("migration-send-%s", uuid.NewRandom().String()) + if err := driver.ZfsPoolVolumeSnapshotCreate(s.onDiskPoolName, fmt.Sprintf("containers/%s", projectPrefix(s.container.Project(), s.container.Name())), s.stoppedSnapName); err != nil { + return err + } + + if err := s.send(conn, s.stoppedSnapName, s.runningSnapName, nil); err != nil { + return err + } + + return nil +} + +func (s *zfsMigrationSourceDriver2) Cleanup() { + if s.stoppedSnapName != "" { + driver.ZfsPoolVolumeSnapshotDestroy(s.onDiskPoolName, fmt.Sprintf("containers/%s", projectPrefix(s.container.Project(), s.container.Name())), s.stoppedSnapName) + } + if s.runningSnapName != "" { + driver.ZfsPoolVolumeSnapshotDestroy(s.onDiskPoolName, fmt.Sprintf("containers/%s", projectPrefix(s.container.Project(), s.container.Name())), s.runningSnapName) + } +} + +func (s *zfsMigrationSourceDriver2) SendStorageVolume(conn *websocket.Conn, op *operation, bwlimit string, storage storage, volumeOnly bool) error { + msg := fmt.Sprintf("Function not implemented") + logger.Errorf(msg) + return fmt.Errorf(msg) +} + +func zfsMigrationSink(pool *api.StoragePool, volume *api.StorageVolume, conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { + var poolName string + + if pool.Config["zfs.pool_name"] != "" { + poolName = pool.Config["zfs.pool_name"] + } else { + poolName = pool.Name + } + + zfsRecv := func(zfsName string, writeWrapper func(io.WriteCloser) io.WriteCloser) error { + zfsFsName := fmt.Sprintf("%s/%s", poolName, zfsName) + args := []string{"receive", "-F", "-u", zfsFsName} + cmd := exec.Command("zfs", args...) + + stdin, err := cmd.StdinPipe() + if err != nil { + return err + } + + stderr, err := cmd.StderrPipe() + if err != nil { + return err + } + + if err := cmd.Start(); err != nil { + return err + } + + writePipe := io.WriteCloser(stdin) + if writeWrapper != nil { + writePipe = writeWrapper(stdin) + } + + <-shared.WebsocketRecvStream(writePipe, conn) + + output, err := ioutil.ReadAll(stderr) + if err != nil { + logger.Debugf("Problem reading zfs recv stderr %s", err) + } + + err = cmd.Wait() + if err != nil { + logger.Errorf("Problem with zfs recv: %s", string(output)) + } + return err + } + + /* In some versions of zfs we can write `zfs recv -F` to mounted + * filesystems, and in some versions we can't. So, let's always unmount + * this fs (it's empty anyway) before we zfs recv. N.B. that `zfs recv` + * of a snapshot also needs tha actual fs that it has snapshotted + * unmounted, so we do this before receiving anything. + */ + zfsName := fmt.Sprintf("containers/%s", projectPrefix(args.Container.Project(), args.Container.Name())) + containerMntPoint := getContainerMountPoint(args.Container.Project(), pool.Name, args.Container.Name()) + if shared.IsMountPoint(containerMntPoint) { + err := driver.ZfsUmount(poolName, zfsName, containerMntPoint) + if err != nil { + return err + } + } + + if len(args.Snapshots) > 0 { + snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", pool.Name, "containers-snapshots", projectPrefix(args.Container.Project(), volume.Name)) + snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(args.Container.Project(), args.Container.Name())) + if !shared.PathExists(snapshotMntPointSymlink) { + err := os.Symlink(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) + if err != nil { + return err + } + } + } + + // At this point we have already figured out the parent + // container's root disk device so we can simply + // retrieve it from the expanded devices. + parentStoragePool := "" + parentExpandedDevices := args.Container.ExpandedDevices() + parentLocalRootDiskDeviceKey, parentLocalRootDiskDevice, _ := shared.GetRootDiskDevice(parentExpandedDevices) + if parentLocalRootDiskDeviceKey != "" { + parentStoragePool = parentLocalRootDiskDevice["pool"] + } + + // A little neuroticism. + if parentStoragePool == "" { + return fmt.Errorf("detected that the container's root device is missing the pool property during BTRFS migration") + } + + for _, snap := range args.Snapshots { + ctArgs := snapshotProtobufToContainerArgs(args.Container.Project(), args.Container.Name(), snap) + + // Ensure that snapshot and parent container have the + // same storage pool in their local root disk device. + // If the root disk device for the snapshot comes from a + // profile on the new instance as well we don't need to + // do anything. + if ctArgs.Devices != nil { + snapLocalRootDiskDeviceKey, _, _ := shared.GetRootDiskDevice(ctArgs.Devices) + if snapLocalRootDiskDeviceKey != "" { + ctArgs.Devices[snapLocalRootDiskDeviceKey]["pool"] = parentStoragePool + } + } + _, err := containerCreateEmptySnapshot(args.Container.DaemonState(), ctArgs) + if err != nil { + return err + } + + wrapper := StorageProgressWriter(op, "fs_progress", snap.GetName()) + name := fmt.Sprintf("containers/%s at snapshot-%s", projectPrefix(args.Container.Project(), args.Container.Name()), snap.GetName()) + if err := zfsRecv(name, wrapper); err != nil { + return err + } + + snapshotMntPoint := getSnapshotMountPoint(args.Container.Project(), poolName, fmt.Sprintf("%s/%s", args.Container.Name(), *snap.Name)) + if !shared.PathExists(snapshotMntPoint) { + err := os.MkdirAll(snapshotMntPoint, 0700) + if err != nil { + return err + } + } + } + + defer func() { + /* clean up our migration-send snapshots that we got from recv. */ + zfsSnapshots, err := driver.ZfsPoolListSnapshots(poolName, fmt.Sprintf("containers/%s", projectPrefix(args.Container.Project(), args.Container.Name()))) + if err != nil { + logger.Errorf("Failed listing snapshots post migration: %s", err) + return + } + + for _, snap := range zfsSnapshots { + // If we received a bunch of snapshots, remove the migration-send-* ones, if not, wipe any snapshot we got + if args.Snapshots != nil && len(args.Snapshots) > 0 && !strings.HasPrefix(snap, "migration-send") { + continue + } + + driver.ZfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(args.Container.Project(), args.Container.Name())), snap) + } + }() + + /* finally, do the real container */ + wrapper := StorageProgressWriter(op, "fs_progress", args.Container.Name()) + if err := zfsRecv(zfsName, wrapper); err != nil { + return err + } + + if args.Live { + /* and again for the post-running snapshot if this was a live migration */ + wrapper := StorageProgressWriter(op, "fs_progress", args.Container.Name()) + if err := zfsRecv(zfsName, wrapper); err != nil { + return err + } + } + + /* Sometimes, zfs recv mounts this anyway, even if we pass -u + * (https://forums.freebsd.org/threads/zfs-receive-u-shouldnt-mount-received-filesystem-right.36844/) + * but sometimes it doesn't. Let's try to mount, but not complain about + * failure. + */ + driver.ZfsMount(poolName, zfsName) + return nil +} From 730ae1c18d5c78de3258a7b4c87567ddd9c4f521 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 16:02:11 +0200 Subject: [PATCH 12/15] lxd: Use new storage code Signed-off-by: Thomas Hipp --- lxd/api_internal.go | 15 +- lxd/container_lxc.go | 10 +- lxd/main_init_interactive.go | 7 +- lxd/migrate_container.go | 5 +- lxd/migrate_storage_volumes.go | 5 +- lxd/patches.go | 21 +- lxd/storage.go | 2515 +++++++++++++++++++++++++++++++- 7 files changed, 2498 insertions(+), 80 deletions(-) diff --git a/lxd/api_internal.go b/lxd/api_internal.go index fb47c4ef58..b8857001e4 100644 --- a/lxd/api_internal.go +++ b/lxd/api_internal.go @@ -20,12 +20,13 @@ import ( "github.com/lxc/lxd/lxd/db/cluster" "github.com/lxc/lxd/lxd/db/node" "github.com/lxc/lxd/lxd/db/query" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" + log "github.com/lxc/lxd/shared/log15" "github.com/lxc/lxd/shared/logger" "github.com/lxc/lxd/shared/osarch" - log "github.com/lxc/lxd/shared/log15" runtimeDebug "runtime/debug" ) @@ -607,7 +608,7 @@ func internalImport(d *Daemon, r *http.Request) Response { } case "zfs": onDiskPoolName := backup.Pool.Config["zfs.pool_name"] - snaps, err := zfsPoolListSnapshots(onDiskPoolName, + snaps, err := driver.ZfsPoolListSnapshots(onDiskPoolName, fmt.Sprintf("containers/%s", req.Name)) if err != nil { return InternalError(err) @@ -663,10 +664,10 @@ func internalImport(d *Daemon, r *http.Request) Response { switch backup.Pool.Driver { case "btrfs": snapName := fmt.Sprintf("%s/%s", req.Name, od) - err = btrfsSnapshotDeleteInternal(project, poolName, snapName) + err = driver.BtrfsSnapshotDeleteInternal(project, poolName, snapName) case "dir": snapName := fmt.Sprintf("%s/%s", req.Name, od) - err = dirSnapshotDeleteInternal(project, poolName, snapName) + err = driver.DirSnapshotDeleteInternal(project, poolName, snapName) case "lvm": onDiskPoolName := backup.Pool.Config["lvm.vg_name"] if onDiskPoolName == "" { @@ -698,7 +699,7 @@ func internalImport(d *Daemon, r *http.Request) Response { case "zfs": onDiskPoolName := backup.Pool.Config["zfs.pool_name"] snapName := fmt.Sprintf("%s/%s", req.Name, od) - err = zfsSnapshotDeleteInternal(project, poolName, snapName, + err = driver.ZfsSnapshotDeleteInternal(project, poolName, snapName, onDiskPoolName) } if err != nil { @@ -710,7 +711,7 @@ func internalImport(d *Daemon, r *http.Request) Response { switch backup.Pool.Driver { case "btrfs": snpMntPt := getSnapshotMountPoint(project, backup.Pool.Name, snap.Name) - if !shared.PathExists(snpMntPt) || !isBtrfsSubVolume(snpMntPt) { + if !shared.PathExists(snpMntPt) || !driver.IsBtrfsSubVolume(snpMntPt) { if req.Force { continue } @@ -771,7 +772,7 @@ func internalImport(d *Daemon, r *http.Request) Response { ctName, csName, _ := containerGetParentAndSnapshotName(snap.Name) snapshotName := fmt.Sprintf("snapshot-%s", csName) - exists := zfsFilesystemEntityExists(poolName, + exists := driver.ZfsFilesystemEntityExists(poolName, fmt.Sprintf("containers/%s@%s", ctName, snapshotName)) if !exists { diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 373e4ea0ba..1afff82b5b 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -2034,7 +2034,7 @@ func (c *containerLXC) startCommon() (string, error) { if diskIdmap != nil { if c.Storage().GetStorageType() == storageTypeZfs { - err = diskIdmap.UnshiftRootfs(c.RootfsPath(), zfsIdmapSetSkipper) + err = diskIdmap.UnshiftRootfs(c.RootfsPath(), driver.ZfsIdmapSetSkipper) } else { err = diskIdmap.UnshiftRootfs(c.RootfsPath(), nil) } @@ -2048,7 +2048,7 @@ func (c *containerLXC) startCommon() (string, error) { if nextIdmap != nil && !c.state.OS.Shiftfs { if c.Storage().GetStorageType() == storageTypeZfs { - err = nextIdmap.ShiftRootfs(c.RootfsPath(), zfsIdmapSetSkipper) + err = nextIdmap.ShiftRootfs(c.RootfsPath(), driver.ZfsIdmapSetSkipper) } else { err = nextIdmap.ShiftRootfs(c.RootfsPath(), nil) } @@ -5174,7 +5174,7 @@ func (c *containerLXC) Export(w io.Writer, properties map[string]string) error { var err error if c.Storage().GetStorageType() == storageTypeZfs { - err = idmap.UnshiftRootfs(c.RootfsPath(), zfsIdmapSetSkipper) + err = idmap.UnshiftRootfs(c.RootfsPath(), driver.ZfsIdmapSetSkipper) } else { err = idmap.UnshiftRootfs(c.RootfsPath(), nil) } @@ -5184,7 +5184,7 @@ func (c *containerLXC) Export(w io.Writer, properties map[string]string) error { } if c.Storage().GetStorageType() == storageTypeZfs { - defer idmap.ShiftRootfs(c.RootfsPath(), zfsIdmapSetSkipper) + defer idmap.ShiftRootfs(c.RootfsPath(), driver.ZfsIdmapSetSkipper) } else { defer idmap.ShiftRootfs(c.RootfsPath(), nil) } @@ -5499,7 +5499,7 @@ func (c *containerLXC) Migrate(args *CriuMigrationArgs) error { } if c.Storage().GetStorageType() == storageTypeZfs { - err = idmapset.ShiftRootfs(args.stateDir, zfsIdmapSetSkipper) + err = idmapset.ShiftRootfs(args.stateDir, driver.ZfsIdmapSetSkipper) } else { err = idmapset.ShiftRootfs(args.stateDir, nil) } diff --git a/lxd/main_init_interactive.go b/lxd/main_init_interactive.go index 0c179ef833..4b03bcb7c6 100644 --- a/lxd/main_init_interactive.go +++ b/lxd/main_init_interactive.go @@ -15,8 +15,9 @@ import ( "github.com/spf13/cobra" "gopkg.in/yaml.v2" - "github.com/lxc/lxd/client" + lxd "github.com/lxc/lxd/client" "github.com/lxc/lxd/lxd/cluster" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" @@ -487,7 +488,7 @@ func (c *cmdInit) askStoragePool(config *cmdInitData, d lxd.ContainerServer, poo if cli.AskBool(fmt.Sprintf("Create a new %s pool? (yes/no) [default=yes]: ", strings.ToUpper(pool.Driver)), "yes") { if pool.Driver == "zfs" && os.Geteuid() == 0 { - poolVolumeExists, err := zfsPoolVolumeExists(pool.Name) + poolVolumeExists, err := driver.ZfsPoolVolumeExists(pool.Name) if err == nil && poolVolumeExists { return fmt.Errorf("'%s' ZFS pool already exists", pool.Name) } @@ -564,7 +565,7 @@ func (c *cmdInit) askStoragePool(config *cmdInitData, d lxd.ContainerServer, poo } if pool.Driver == "zfs" && os.Geteuid() == 0 { - poolVolumeExists, err := zfsPoolVolumeExists(pool.Config["source"]) + poolVolumeExists, err := driver.ZfsPoolVolumeExists(pool.Config["source"]) if err == nil && !poolVolumeExists { return fmt.Errorf("'%s' ZFS pool or dataset does not exist", pool.Config["source"]) } diff --git a/lxd/migrate_container.go b/lxd/migrate_container.go index c2bda84afb..6390507f56 100644 --- a/lxd/migrate_container.go +++ b/lxd/migrate_container.go @@ -17,6 +17,7 @@ import ( "github.com/lxc/lxd/lxd/db" "github.com/lxc/lxd/lxd/migration" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" @@ -394,7 +395,7 @@ func (s *migrationSourceWs) Do(migrateOp *operation) error { }, } - if len(zfsVersion) >= 3 && zfsVersion[0:3] != "0.6" { + if len(driver.ZfsVersion) >= 3 && driver.ZfsVersion[0:3] != "0.6" { header.ZfsFeatures = &migration.ZfsFeatures{ Compress: &hasFeature, } @@ -864,7 +865,7 @@ func (c *migrationSink) Do(migrateOp *operation) error { } // Return those ZFS features we know about (with the value sent by the remote) - if len(zfsVersion) >= 3 && zfsVersion[0:3] != "0.6" { + if len(driver.ZfsVersion) >= 3 && driver.ZfsVersion[0:3] != "0.6" { if header.ZfsFeatures != nil && header.ZfsFeatures.Compress != nil { resp.ZfsFeatures = &migration.ZfsFeatures{ Compress: header.ZfsFeatures.Compress, diff --git a/lxd/migrate_storage_volumes.go b/lxd/migrate_storage_volumes.go index bb8748e420..6a287f35c6 100644 --- a/lxd/migrate_storage_volumes.go +++ b/lxd/migrate_storage_volumes.go @@ -7,6 +7,7 @@ import ( "github.com/gorilla/websocket" "github.com/lxc/lxd/lxd/migration" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" "github.com/lxc/lxd/shared/logger" @@ -91,7 +92,7 @@ func (s *migrationSourceWs) DoStorage(migrateOp *operation) error { }, } - if len(zfsVersion) >= 3 && zfsVersion[0:3] != "0.6" { + if len(driver.ZfsVersion) >= 3 && driver.ZfsVersion[0:3] != "0.6" { header.ZfsFeatures = &migration.ZfsFeatures{ Compress: &hasFeature, } @@ -290,7 +291,7 @@ func (c *migrationSink) DoStorage(migrateOp *operation) error { }, } - if len(zfsVersion) >= 3 && zfsVersion[0:3] != "0.6" { + if len(driver.ZfsVersion) >= 3 && driver.ZfsVersion[0:3] != "0.6" { resp.ZfsFeatures = &migration.ZfsFeatures{ Compress: &hasFeature, } diff --git a/lxd/patches.go b/lxd/patches.go index d6cf113466..eaa7917e1c 100644 --- a/lxd/patches.go +++ b/lxd/patches.go @@ -12,15 +12,16 @@ import ( "github.com/boltdb/bolt" "github.com/hashicorp/raft" - "github.com/hashicorp/raft-boltdb" + raftboltdb "github.com/hashicorp/raft-boltdb" + "github.com/pkg/errors" + "github.com/lxc/lxd/lxd/cluster" "github.com/lxc/lxd/lxd/db" "github.com/lxc/lxd/lxd/db/query" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/shared" - "github.com/lxc/lxd/shared/logger" - "github.com/pkg/errors" - log "github.com/lxc/lxd/shared/log15" + "github.com/lxc/lxd/shared/logger" ) /* Patches are one-time actions that are sometimes needed to update @@ -599,7 +600,7 @@ func upgradeFromStorageTypeBtrfs(name string, d *Daemon, defaultPoolName string, if shared.PathExists(oldContainerMntPoint) && !shared.PathExists(newContainerMntPoint) { err = os.Rename(oldContainerMntPoint, newContainerMntPoint) if err != nil { - err := btrfsSubVolumeCreate(newContainerMntPoint) + err := driver.BtrfsSubVolumeCreate(newContainerMntPoint) if err != nil { return err } @@ -610,7 +611,7 @@ func upgradeFromStorageTypeBtrfs(name string, d *Daemon, defaultPoolName string, return err } - btrfsSubVolumesDelete(oldContainerMntPoint) + driver.BtrfsSubVolumesDelete(oldContainerMntPoint) if shared.PathExists(oldContainerMntPoint) { err = os.RemoveAll(oldContainerMntPoint) if err != nil { @@ -684,9 +685,9 @@ func upgradeFromStorageTypeBtrfs(name string, d *Daemon, defaultPoolName string, oldSnapshotMntPoint := shared.VarPath("snapshots", cs) newSnapshotMntPoint := getSnapshotMountPoint("default", defaultPoolName, cs) if shared.PathExists(oldSnapshotMntPoint) && !shared.PathExists(newSnapshotMntPoint) { - err = btrfsSnapshot(oldSnapshotMntPoint, newSnapshotMntPoint, true) + err = driver.BtrfsSnapshot(oldSnapshotMntPoint, newSnapshotMntPoint, true) if err != nil { - err := btrfsSubVolumeCreate(newSnapshotMntPoint) + err := driver.BtrfsSubVolumeCreate(newSnapshotMntPoint) if err != nil { return err } @@ -697,7 +698,7 @@ func upgradeFromStorageTypeBtrfs(name string, d *Daemon, defaultPoolName string, return err } - btrfsSubVolumesDelete(oldSnapshotMntPoint) + driver.BtrfsSubVolumesDelete(oldSnapshotMntPoint) if shared.PathExists(oldSnapshotMntPoint) { err = os.RemoveAll(oldSnapshotMntPoint) if err != nil { @@ -706,7 +707,7 @@ func upgradeFromStorageTypeBtrfs(name string, d *Daemon, defaultPoolName string, } } else { // Delete the old subvolume. - err = btrfsSubVolumesDelete(oldSnapshotMntPoint) + err = driver.BtrfsSubVolumesDelete(oldSnapshotMntPoint) if err != nil { return err } diff --git a/lxd/storage.go b/lxd/storage.go index 9df1e51632..f882bc0419 100644 --- a/lxd/storage.go +++ b/lxd/storage.go @@ -4,6 +4,7 @@ import ( "encoding/json" "fmt" "io" + "io/ioutil" "os" "sync" "sync/atomic" @@ -14,10 +15,12 @@ import ( "github.com/lxc/lxd/lxd/db" "github.com/lxc/lxd/lxd/migration" "github.com/lxc/lxd/lxd/state" + driver "github.com/lxc/lxd/lxd/storage" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" "github.com/lxc/lxd/shared/idmap" "github.com/lxc/lxd/shared/ioprogress" + log "github.com/lxc/lxd/shared/log15" "github.com/lxc/lxd/shared/logger" "github.com/lxc/lxd/shared/version" ) @@ -129,6 +132,2378 @@ func storageStringToType(sName string) (storageType, error) { return -1, fmt.Errorf("invalid storage type name") } +type Storage struct { + sType storageType + sTypeName string + + s *state.State + + poolID int64 + pool *api.StoragePool + + volumeID int64 + volume *api.StorageVolume + + driver StorageDriver +} + +func (s *Storage) GetStorageType() storageType { + return s.sType +} + +func (s *Storage) GetStorageTypeName() string { + return s.sTypeName +} + +func (s *Storage) GetStorageTypeVersion() string { + return s.driver.GetVersion() +} + +func (s *Storage) GetState() *state.State { + return s.s +} + +func (s *Storage) GetStoragePoolWritable() api.StoragePoolPut { + return s.pool.Writable() +} + +func (s *Storage) SetStoragePoolWritable(writable *api.StoragePoolPut) { + s.pool.StoragePoolPut = *writable +} + +func (s *Storage) GetStoragePool() *api.StoragePool { + return s.pool +} + +func (s *Storage) GetStoragePoolVolumeWritable() api.StorageVolumePut { + return s.volume.Writable() +} + +func (s *Storage) SetStoragePoolVolumeWritable(writable *api.StorageVolumePut) { + s.volume.StorageVolumePut = *writable +} + +func (s *Storage) GetStoragePoolVolume() *api.StorageVolume { + return s.volume +} + +func (s *Storage) GetContainerPoolInfo() (int64, string, string) { + return s.poolID, s.pool.Name, s.pool.Name +} + +func (s *Storage) StorageCoreInit() error { + return s.driver.Init() +} + +func (s *Storage) StoragePoolInit() error { + s.driver.SharedInit(s.s, s.pool, s.poolID, s.volume) + return s.StorageCoreInit() +} + +func (s *Storage) StoragePoolCheck() error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + } + success := false + + defer logAction( + "Checking storage pool", + "Checked storage pool", + "Failed to check storage pool", + &ctx, &success, &err)() + + err = s.driver.StoragePoolCheck() + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolCreate() error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + } + success := false + + defer logAction( + "Creating storage pool", + "Created storage pool", + "Failed to create storage pool", + &ctx, &success, &err)() + + s.pool.Config["volatile.initial_source"] = s.pool.Config["source"] + + err = s.driver.StoragePoolCreate() + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolDelete() error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + } + success := false + + defer logAction( + "Deleting storage pool", + "Deleted storage pool", + "Failed to delete storage pool", + &ctx, &success, &err)() + + err = s.driver.StoragePoolDelete() + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolMount() (bool, error) { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + } + success := false + + defer logAction( + "Mounting storage pool", + "Mounted storage pool", + "Failed to mount storage pool", + &ctx, &success, &err)() + + ok, err := s.driver.StoragePoolMount() + if err != nil { + return ok, err + } + + success = true + + return ok, nil +} + +func (s *Storage) StoragePoolUmount() (bool, error) { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + } + success := false + + defer logAction( + "Unmounting storage pool", + "Unmounted storage pool", + "Failed to unmount storage pool", + &ctx, &success, &err)() + + ok, err := s.driver.StoragePoolUmount() + if err != nil { + return ok, err + } + + success = true + + return ok, nil +} + +func (s *Storage) StoragePoolResources() (*api.ResourcesStoragePool, error) { + return s.driver.StoragePoolResources() +} + +func (s *Storage) StoragePoolUpdate(writable *api.StoragePoolPut, changedConfig []string) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + } + success := false + + defer logAction( + "Updating storage pool", + "Updated storage pool", + "Failed to update storage pool", + &ctx, &success, &err)() + + changeable := changeableStoragePoolProperties[s.sTypeName] + unchangeable := []string{} + + for _, change := range changedConfig { + if !shared.StringInSlice(change, changeable) { + unchangeable = append(unchangeable, change) + } + } + + if len(unchangeable) > 0 { + err = updateStoragePoolError(unchangeable, s.sTypeName) + return err + } + + err = s.driver.StoragePoolUpdate(writable, changedConfig) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolVolumeCreate() error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "volume": s.volume.Name, + } + success := false + + defer logAction( + "Creating storage pool volume", + "Created storage pool volume", + "Failed to create storage pool volume", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + isSnapshot := shared.IsSnapshot(s.volume.Name) + + // Create subvolume path on the storage pool. + var volumePath string + + if isSnapshot { + volumePath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, "") + } else { + volumePath = getStoragePoolVolumeMountPoint(s.pool.Name, "") + } + + if !shared.PathExists(volumePath) { + err = os.MkdirAll(volumePath, customDirMode) + if err != nil { + return err + } + } + + err = s.driver.VolumeCreate("default", s.volume.Name, driver.VolumeTypeCustom) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolVolumeDelete() error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "volume": s.volume.Name, + } + success := false + + defer logAction( + "Deleting storage pool volume", + "Deleted storage pool volume", + "Failed to delete storage pool volume", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + volumeMntPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) + + err = s.driver.VolumeDelete("default", s.volume.Name, true, driver.VolumeTypeCustom) + if err != nil { + return err + } + + err = os.RemoveAll(volumeMntPoint) + if err != nil { + return err + } + + err = s.s.Cluster.StoragePoolVolumeDelete( + "default", + s.volume.Name, + storagePoolVolumeTypeCustom, + s.poolID) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolVolumeMount() (bool, error) { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "volume": s.volume.Name, + } + success := false + + defer logAction( + "Mounting storage pool volume", + "Mounted storage pool volume", + "Failed to mount storage pool volume", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return ourMount, err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + ok, err := s.driver.VolumeMount("default", s.volume.Name, driver.VolumeTypeCustom) + if err != nil { + return ok, err + } + + success = true + + return ok, nil +} + +func (s *Storage) StoragePoolVolumeUmount() (bool, error) { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "volume": s.volume.Name, + } + success := false + + defer logAction( + "Unmounting storage pool volume", + "Unmounting storage pool volume", + "Failed to unmount storage pool volume", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return ourMount, err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + ok, err := s.driver.VolumeUmount("default", s.volume.Name, driver.VolumeTypeCustom) + if err != nil { + return ok, err + } + + success = true + + return ok, nil +} + +func (s *Storage) StoragePoolVolumeUpdate(writable *api.StorageVolumePut, changedConfig []string) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "volume": s.volume.Name, + } + success := false + + defer logAction( + "Updating storage pool volume", + "Updated storage pool volume", + "Failed to update storage pool volume", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + if writable.Restore != "" { + err = s.driver.VolumeSnapshotRestore("default", fmt.Sprintf("%s/%s", s.volume.Name, writable.Restore), s.volume.Name, driver.VolumeTypeCustomSnapshot) + if err != nil { + return err + } + + success = true + + return nil + } + + changeable := changeableStoragePoolVolumeProperties[s.sTypeName] + unchangeable := []string{} + for _, change := range changedConfig { + if !shared.StringInSlice(change, changeable) { + unchangeable = append(unchangeable, change) + } + } + + if len(unchangeable) > 0 { + err = updateStoragePoolVolumeError(unchangeable, s.sTypeName) + return err + } + + err = s.driver.VolumeUpdate(writable, changedConfig) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolVolumeRename(newName string) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "old_name": s.volume.Name, + "new_name": newName, + } + success := false + + defer logAction( + "Renaming storage pool volume", + "Renamed storage pool volume", + "Failed to rename storage pool volume", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + usedBy, err := storagePoolVolumeUsedByContainersGet(s.s, "default", s.volume.Name, + storagePoolVolumeTypeNameCustom) + if err != nil { + return err + } + + if len(usedBy) > 0 { + err = fmt.Errorf(`storage volume "%s" on storage pool "%s" is attached to containers`, + s.volume.Name, s.pool.Name) + return err + } + + err = s.driver.VolumeRename("default", s.volume.Name, newName, nil, driver.VolumeTypeCustom) + if err != nil { + return err + } + + err = s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, newName, + storagePoolVolumeTypeCustom, s.poolID) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolVolumeCopy(source *api.StorageVolumeSource) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "source": source.Name, + "target": s.volume.Name, + } + success := false + + defer logAction( + "Copying storage pool volume", + "Copied storage pool volume", + "Failed to copy storage pool volume", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + if s.pool.Name != source.Pool { + err = s.doCrossPoolVolumeCopy(source) + if err != nil { + return err + } + + success = true + + return nil + } + + isSnapshot := shared.IsSnapshot(source.Name) + volumeMntPoint := getStoragePoolVolumeMountPoint(s.pool.Name, "") + + err = os.MkdirAll(volumeMntPoint, customDirMode) + if err != nil { + return err + } + + if isSnapshot { + return s.driver.VolumeSnapshotCopy("default", source.Name, s.volume.Name, driver.VolumeTypeCustomSnapshot) + } + + snapshots, err := s.s.Cluster.StoragePoolVolumeSnapshotsGetType(source.Name, storagePoolVolumeTypeCustom, s.poolID) + if err != nil { + return err + } + + var snapOnlyNames []string + + for _, snap := range snapshots { + snapOnlyNames = append(snapOnlyNames, shared.ExtractSnapshotName(snap)) + } + + err = s.driver.VolumeCopy("default", source.Name, s.volume.Name, snapOnlyNames, driver.VolumeTypeCustom) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) doCrossPoolVolumeCopy(source *api.StorageVolumeSource) error { + // setup storage for the source volume + srcStorage, err := storagePoolVolumeInit(s.s, "default", source.Pool, source.Name, + storagePoolVolumeTypeCustom) + if err != nil { + return err + } + + ourMount, err := srcStorage.StoragePoolMount() + if err != nil { + return err + } + if ourMount { + defer srcStorage.StoragePoolUmount() + } + + err = s.StoragePoolVolumeCreate() + if err != nil { + return err + } + + ourMount, err = s.StoragePoolVolumeMount() + if err != nil { + return err + } + if ourMount { + defer s.StoragePoolVolumeUmount() + } + + dstMountPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) + bwlimit := s.pool.Config["rsync.bwlimit"] + + if !source.VolumeOnly { + snapshots, err := storagePoolVolumeSnapshotsGet(s.s, source.Pool, source.Name, storagePoolVolumeTypeCustom) + if err != nil { + return err + } + + for _, snap := range snapshots { + srcMountPoint := getStoragePoolVolumeSnapshotMountPoint(source.Pool, snap) + + _, err = rsyncLocalCopy(srcMountPoint, dstMountPoint, bwlimit) + if err != nil { + logger.Errorf("Failed to rsync into ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) + return err + } + + _, snapOnlyName, _ := containerGetParentAndSnapshotName(source.Name) + + s.StoragePoolVolumeSnapshotCreate(&api.StorageVolumeSnapshotsPost{Name: fmt.Sprintf("%s/%s", s.volume.Name, snapOnlyName)}) + } + } + + var srcMountPoint string + + if shared.IsSnapshot(source.Name) { + srcMountPoint = getStoragePoolVolumeSnapshotMountPoint(source.Pool, source.Name) + } else { + srcMountPoint = getStoragePoolVolumeMountPoint(source.Pool, source.Name) + } + + _, err = rsyncLocalCopy(srcMountPoint, dstMountPoint, bwlimit) + if err != nil { + os.RemoveAll(dstMountPoint) + return err + } + + return nil +} + +func (s *Storage) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "source": s.volume.Name, + "target": target.Name, + } + success := false + + defer logAction( + "Creating storage pool volume snapshot", + "Created storage pool volume snapshot", + "Failed to create storage pool volume snapshot", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + _, _, ok := containerGetParentAndSnapshotName(target.Name) + if !ok { + err = fmt.Errorf("Not a snapshot name") + return err + } + + targetPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) + + err = os.MkdirAll(targetPath, snapshotsDirMode) + if err != nil { + return err + } + + err = s.driver.VolumeSnapshotCreate("default", s.volume.Name, target.Name, + driver.VolumeTypeCustomSnapshot) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolVolumeSnapshotDelete() error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "volume": s.volume.Name, + } + success := false + + defer logAction( + "Deleting storage pool volume snapshot", + "Deleted storage pool volume snapshot", + "Failed to delete storage pool volume snapshot", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + err = s.driver.VolumeSnapshotDelete("default", s.volume.Name, true, driver.VolumeTypeCustomSnapshot) + if err != nil { + return err + } + + snapshotMntPoint := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) + + err = os.RemoveAll(snapshotMntPoint) + if err != nil { + return err + } + + sourceVolumeName, _, _ := containerGetParentAndSnapshotName(s.volume.Name) + snapshotVolumePath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, sourceVolumeName) + + empty, _ := shared.PathIsEmpty(snapshotVolumePath) + if empty { + err = os.Remove(snapshotVolumePath) + if err != nil { + return err + } + + snapshotSymlink := shared.VarPath("custom-snapshots", sourceVolumeName) + if shared.PathExists(snapshotSymlink) { + err = os.Remove(snapshotSymlink) + if err != nil { + return err + } + } + } + + err = s.s.Cluster.StoragePoolVolumeDelete( + "default", + s.volume.Name, + storagePoolVolumeTypeCustom, + s.poolID) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) StoragePoolVolumeSnapshotRename(newName string) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "old_name": s.volume.Name, + "new_name": newName, + } + success := false + + defer logAction( + "Renaming storage pool volume snapshot", + "Renamed storage pool volume snapshot", + "Failed to rename storage pool volume snapshot", + &ctx, &success, &err)() + + sourceName, _, _ := containerGetParentAndSnapshotName(s.volume.Name) + fullSnapshotName := fmt.Sprintf("%s%s%s", sourceName, shared.SnapshotDelimiter, newName) + + oldPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) + newPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, fullSnapshotName) + + err = os.MkdirAll(newPath, customDirMode) + if err != nil { + return err + } + + err = s.driver.VolumeSnapshotRename("default", s.volume.Name, fullSnapshotName, driver.VolumeTypeCustomSnapshot) + if err != nil { + return err + } + + // It might be, that the driver already renamed the path. + if shared.PathExists(oldPath) { + err = os.Rename(oldPath, newPath) + } + + err = s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, fullSnapshotName, storagePoolVolumeTypeCustom, s.poolID) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) ContainerCreate(container container) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "name": container.Name(), + } + success := false + + defer logAction( + "Creating container", + "Created container", + "Failed to create container", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + err = errors.Wrapf(err, "Mount storage pool '%s'", s.pool.Name) + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + containerPath := getContainerMountPoint("default", s.pool.Name, "") + + err = os.MkdirAll(containerPath, containersDirMode) + if err != nil { + err = errors.Wrapf(err, "Create containers mountpoint '%s'", containerPath) + return err + } + + // Create container volume + err = s.driver.VolumeCreate(container.Project(), container.Name(), + driver.VolumeTypeContainer) + if err != nil { + err = errors.Wrapf(err, "Create container '%s'", container.Name()) + return err + } + + // Create directories + containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) + + err = createContainerMountpoint(containerMntPoint, container.Path(), container.IsPrivileged()) + if err != nil { + err = errors.Wrapf(err, "Create container mountpoint '%s'", containerMntPoint) + return err + } + + revert := false + + defer func() { + if revert { + deleteContainerMountpoint(containerMntPoint, container.Path(), s.GetStorageTypeName()) + } + }() + + success = true + + return container.TemplateApply("create") +} + +func (s *Storage) ContainerCreateFromImage(container container, fingerprint string, tracker *ioprogress.ProgressTracker) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "name": container.Name(), + "fingerprint": fingerprint, + } + success := false + + defer logAction( + "Creating container from image", + "Created from image", + "Failed to create from image", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + err = errors.Wrapf(err, "Mount storage pool '%s'", s.pool.Name) + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + containerPath := getContainerMountPoint("default", s.pool.Name, "") + + err = os.MkdirAll(containerPath, containersDirMode) + if err != nil { + err = errors.Wrapf(err, "Create containers mountpoint '%s'", containerPath) + return err + } + + // ImageCreate / VolumeCreate + if s.sType == storageTypeBtrfs || s.sType == storageTypeZfs { + err = s.ImageCreate(fingerprint, tracker) + if err != nil { + err = errors.Wrapf(err, "Create image '%s'", fingerprint) + return err + } + } + + // Create directories + containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) + + imageMntPoint := shared.VarPath("images", fingerprint) + revert := true + + // Create container volume + if s.sType == storageTypeBtrfs || s.sType == storageTypeZfs { + err = s.driver.VolumeCopy(container.Project(), fingerprint, container.Name(), nil, driver.VolumeTypeImage) + if err != nil { + err = errors.Wrapf(err, "Copy volume") + return err + } + + // For btrfs, it is important to create the container mountpoint _after_ + // the subvolume has been created. + err = createContainerMountpoint(containerMntPoint, container.Path(), container.IsPrivileged()) + if err != nil { + err = errors.Wrapf(err, "Create container mountpoint '%s'", containerMntPoint) + return err + } + + defer func() { + if revert { + deleteContainerMountpoint(containerMntPoint, container.Path(), s.GetStorageTypeName()) + } + }() + + } else { + err = createContainerMountpoint(containerMntPoint, container.Path(), container.IsPrivileged()) + if err != nil { + err = errors.Wrapf(err, "Create container mountpoint '%s'", containerMntPoint) + return err + } + + defer func() { + if revert { + deleteContainerMountpoint(containerMntPoint, container.Path(), s.GetStorageTypeName()) + } + }() + + err = unpackImage(imageMntPoint, containerMntPoint, s.sType, s.s.OS.RunningInUserNS, + tracker) + if err != nil { + err = errors.Wrap(err, "Unpack image") + return err + } + } + + revert = false + success = true + + return container.TemplateApply("create") +} + +func (s *Storage) ContainerDelete(c container) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "container": c.Name(), + } + success := false + + defer logAction( + "Deleting container", + "Deleted container", + "Failed to delete container", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + containerMntPoint := getContainerMountPoint(c.Project(), s.pool.Name, c.Name()) + snapshotMntPoint := getSnapshotMountPoint(c.Project(), s.pool.Name, c.Name()) + // ${LXD_DIR}/snapshots/ to ${POOL}/snapshots/ + snapshotSymlink := shared.VarPath("snapshots", projectPrefix(c.Project(), c.Name())) + + err = s.driver.VolumeDelete(c.Project(), c.Name(), true, driver.VolumeTypeContainer) + if err != nil { + return err + } + + err = deleteContainerMountpoint(containerMntPoint, c.Path(), s.GetStorageTypeName()) + if err != nil { + return err + } + + snapshots, err := c.Snapshots() + if err != nil { + return err + } + + for _, snap := range snapshots { + err = s.driver.VolumeSnapshotDelete(snap.Project(), snap.Name(), true, driver.VolumeTypeContainerSnapshot) + if err != nil { + return err + } + + err = deleteSnapshotMountpoint(snapshotMntPoint, snapshotSymlink, snapshotMntPoint) + if err != nil { + return err + } + } + + success = true + + return nil +} + +func (s *Storage) ContainerRename(c container, newName string) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "old_name": c.Name(), + "new_name": newName, + } + success := false + + logAction( + "Renaming container", + "Renamed container", + "Failed to rename container", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + oldContainerMntPoint := getContainerMountPoint(c.Project(), s.pool.Name, c.Name()) + oldContainerSymlink := containerPath(c.Project(), c.Name(), false) + newContainerMntPoint := getContainerMountPoint(c.Project(), s.pool.Name, newName) + newContainerSymlink := containerPath(c.Project(), newName, false) + + var snapshotNames []string + + snapshots, err := c.Snapshots() + if err != nil { + return err + } + + for _, snap := range snapshots { + snapshotNames = append(snapshotNames, shared.ExtractSnapshotName(snap.Name())) + } + + // Snapshots are renamed here as well as they're tied to a volume/containers. + // There's no need to call VolumeSnapshotRename for each snapshot. + err = s.driver.VolumeRename(c.Project(), c.Name(), newName, snapshotNames, + driver.VolumeTypeContainer) + if err != nil { + return err + } + + err = renameContainerMountpoint(oldContainerMntPoint, oldContainerSymlink, + newContainerMntPoint, newContainerSymlink) + if err != nil { + return err + } + + if c.IsSnapshot() { + success = true + return nil + } + + oldSnapshotsMntPoint := getSnapshotMountPoint(c.Project(), s.pool.Name, c.Name()) + newSnapshotsMntPoint := getSnapshotMountPoint(c.Project(), s.pool.Name, newName) + oldSnapshotSymlink := containerPath(c.Project(), c.Name(), true) + newSnapshotSymlink := containerPath(c.Project(), newName, true) + + err = renameContainerMountpoint(oldSnapshotsMntPoint, oldSnapshotSymlink, newSnapshotsMntPoint, newSnapshotSymlink) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) ContainerMount(c container) (bool, error) { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "container": c.Name(), + } + success := false + + defer logAction( + "Mounting container", + "Mounted container", + "Failed to mount container", + &ctx, &success, &err)() + + _, err = s.driver.StoragePoolMount() + if err != nil { + return false, err + } + + ok, err := s.driver.VolumeMount(c.Project(), c.Name(), driver.VolumeTypeContainer) + if err != nil { + return ok, err + } + + success = true + + return ok, nil +} + +func (s *Storage) ContainerUmount(c container, path string) (bool, error) { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "container": c.Name(), + } + success := false + + defer logAction( + "Unmounting container", + "Unmounted container", + "Failed to unmount container", + &ctx, &success, &err)() + + ok, err := s.driver.VolumeUmount(c.Project(), c.Name(), driver.VolumeTypeContainer) + if err != nil { + return ok, err + } + + success = true + + return ok, nil +} + +func (s *Storage) ContainerCopy(target container, source container, containerOnly bool) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "source": source.Name(), + "target": target.Name(), + } + success := false + + defer logAction( + "Copying container", + "Copied container", + "Failed to copy container", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + ourStart, err := source.StorageStart() + if err != nil { + return err + } + + if ourStart { + defer source.StorageStop() + } + + sourcePool, err := source.StoragePool() + if err != nil { + return err + } + + targetPool, err := target.StoragePool() + if err != nil { + return err + } + + var snapshots []container + + if !containerOnly { + snapshots, err = source.Snapshots() + if err != nil { + return err + } + } + + if sourcePool != targetPool { + err = s.doCrossPoolContainerCopy(target, source, containerOnly, false, snapshots) + if err != nil { + return err + } + + success = true + return nil + } + + containerMntPoint := getContainerMountPoint("default", s.pool.Name, "") + + err = os.MkdirAll(containerMntPoint, containersDirMode) + if err != nil { + return err + } + + targetMntPoint := getContainerMountPoint(target.Project(), s.pool.Name, target.Name()) + + var snapshotNames []string + + if !containerOnly { + for _, c := range snapshots { + snapshotNames = append(snapshotNames, shared.ExtractSnapshotName(c.Name())) + } + + snapshotParentMntPoint := getSnapshotMountPoint(target.Project(), s.pool.Name, + target.Name()) + snapshotParentMntPointSymlink := shared.VarPath("snapshots", + projectPrefix(target.Project(), target.Name())) + + err = createSnapshotMountpoint(snapshotParentMntPoint, snapshotParentMntPoint, + snapshotParentMntPointSymlink) + if err != nil { + return err + } + } + + if shared.IsSnapshot(source.Name()) { + err = s.driver.VolumeSnapshotCopy(source.Project(), source.Name(), target.Name(), driver.VolumeTypeContainerSnapshot) + } else { + err = s.driver.VolumeCopy(source.Project(), source.Name(), target.Name(), snapshotNames, driver.VolumeTypeContainer) + } + if err != nil { + return err + } + + err = createContainerMountpoint(targetMntPoint, target.Path(), target.IsPrivileged()) + if err != nil { + return err + } + + err = target.TemplateApply("copy") + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) ContainerGetUsage(container container) (int64, error) { + return s.driver.VolumeGetUsage(container.Project(), container.Name(), container.Path()) +} + +func (s *Storage) ContainerRefresh(target container, source container, snapshots []container) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "source": source.Name(), + "target": target.Name(), + } + success := false + + defer logAction( + "Refreshing container", + "Refreshed container", + "Failed to refresh container", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + err = s.doCrossPoolContainerCopy(target, source, len(snapshots) == 0, true, snapshots) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) doCrossPoolContainerCopy(target container, source container, containerOnly bool, + refresh bool, refreshSnapshots []container) error { + sourcePool, err := source.StoragePool() + if err != nil { + return err + } + + targetPool, err := target.StoragePool() + if err != nil { + return err + } + + // setup storage for the source volume + srcStorage, err := storagePoolVolumeInit(s.s, "default", sourcePool, source.Name(), + storagePoolVolumeTypeContainer) + if err != nil { + return err + } + + ourMount, err := srcStorage.StoragePoolMount() + if err != nil { + return err + } + if ourMount { + defer srcStorage.StoragePoolUmount() + } + + var snapshots []container + + if refresh { + snapshots = refreshSnapshots + } else { + snapshots, err = source.Snapshots() + if err != nil { + return err + } + + // create the main container + err = s.ContainerCreate(target) + if err != nil { + return err + } + } + + _, err = s.ContainerMount(target) + if err != nil { + return err + } + defer s.ContainerUmount(target, shared.VarPath("containers", projectPrefix(target.Project(), target.Name()))) + + destContainerMntPoint := getContainerMountPoint(target.Project(), targetPool, target.Name()) + bwlimit := s.pool.Config["rsync.bwlimit"] + + if !containerOnly { + snapshotSubvolumePath := getSnapshotMountPoint(target.Project(), s.pool.Name, target.Name()) + if !shared.PathExists(snapshotSubvolumePath) { + err := os.MkdirAll(snapshotSubvolumePath, containersDirMode) + if err != nil { + return err + } + } + + snapshotMntPoint := getSnapshotMountPoint(target.Project(), s.pool.Name, s.volume.Name) + snapshotMntPointSymlink := containerPath(target.Project(), target.Name(), true) + + err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPoint, snapshotMntPointSymlink) + if err != nil { + return err + } + + for _, snap := range snapshots { + srcSnapshotMntPoint := getSnapshotMountPoint(source.Project(), sourcePool, snap.Name()) + targetParentName, snapOnlyName, _ := containerGetParentAndSnapshotName(snap.Name()) + destSnapshotMntPoint := getSnapshotMountPoint(target.Project(), targetPool, + fmt.Sprintf("%s%s%s", target.Name(), shared.SnapshotDelimiter, snapOnlyName)) + + switch s.sType { + case storageTypeZfs: + fallthrough + case storageTypeBtrfs: + _, err = rsyncLocalCopy(srcSnapshotMntPoint, destContainerMntPoint, bwlimit) + if err != nil { + return err + } + + // create snapshot + err = s.driver.VolumeSnapshotCreate(target.Project(), target.Name(), + fmt.Sprintf("%s%s%s", target.Name(), shared.SnapshotDelimiter, snapOnlyName), + driver.VolumeTypeContainerSnapshot) + case storageTypeDir: + _, err = rsyncLocalCopy(srcSnapshotMntPoint, destSnapshotMntPoint, bwlimit) + default: + return fmt.Errorf("Cross pool copy not implemented for '%s'", s.sTypeName) + } + if err != nil { + return err + } + + err := createSnapshotMountpoint(destSnapshotMntPoint, destSnapshotMntPoint, + shared.VarPath("snapshots", + projectPrefix(target.Project(), targetParentName))) + if err != nil { + return err + } + } + } + + srcContainerMntPoint := getContainerMountPoint(source.Project(), sourcePool, source.Name()) + + _, err = rsyncLocalCopy(srcContainerMntPoint, destContainerMntPoint, bwlimit) + if err != nil { + return err + } + + return nil +} + +func (s *Storage) ContainerRestore(targetContainer container, sourceContainer container) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "source": sourceContainer.Name(), + "target": targetContainer.Name(), + } + success := false + + defer logAction( + "Restoring container", + "Restored container", + "Failed to restore container", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + snapshots, err := targetContainer.Snapshots() + if err != nil { + return err + } + + var snapshotNames []string + + for _, snap := range snapshots { + snapshotNames = append(snapshotNames, snap.Name()) + } + + deleteSnapshots := func() error { + for i := len(snapshots) - 1; i != 0; i-- { + if snapshots[i].Name() == sourceContainer.Name() { + break + } + + err := snapshots[i].Delete() + if err != nil { + return err + } + } + + return nil + } + + err = s.driver.VolumePrepareRestore(sourceContainer.Name(), targetContainer.Name(), snapshotNames, deleteSnapshots) + if err != nil { + return err + } + + err = s.driver.VolumeSnapshotRestore(sourceContainer.Project(), sourceContainer.Name(), + targetContainer.Name(), driver.VolumeTypeContainerSnapshot) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) ContainerStorageReady(c container) bool { + return s.driver.VolumeReady(c.Project(), c.Name()) +} + +func (s *Storage) ContainerSnapshotCreate(target container, source container) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "source": source.Name(), + "target": target.Name(), + } + success := false + + defer logAction( + "Creating container snapshot", + "Created container snapshot", + "Failed to create container snapshot", + &ctx, &success, &err)() + + _, err = s.driver.StoragePoolMount() + if err != nil { + return err + } + + // We can only create the btrfs subvolume under the mounted storage + // pool. The on-disk layout for snapshots on a btrfs storage pool will + // thus be + // ${LXD_DIR}/storage-pools//snapshots/. The btrfs tool will + // complain if the intermediate path does not exist, so create it if it + // doesn't already. + snapshotSubvolumePath := getSnapshotMountPoint(source.Project(), s.pool.Name, source.Name()) + err = os.MkdirAll(snapshotSubvolumePath, containersDirMode) + if err != nil { + return err + } + + snapshotMntPoint := getSnapshotMountPoint(source.Project(), s.pool.Name, source.Name()) + snapshotMntPointSymlink := containerPath(source.Project(), source.Name(), target.IsSnapshot()) + + err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPoint, snapshotMntPointSymlink) + if err != nil { + return err + } + + err = s.driver.VolumeSnapshotCreate(source.Project(), source.Name(), target.Name(), driver.VolumeTypeContainerSnapshot) + if err != nil { + return s.ContainerDelete(target) + } + + // This is used only in Dir + if s.sType == storageTypeDir && source.IsRunning() { + err = source.Freeze() + if err != nil { + // Don't just fail here + success = true + return nil + } + + defer source.Unfreeze() + + err = s.driver.VolumeSnapshotCreate(source.Project(), source.Name(), target.Name(), driver.VolumeTypeContainerSnapshot) + if err != nil { + return err + } + } + + success = true + + return nil +} + +func (s *Storage) ContainerSnapshotCreateEmpty(c container) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "snapshot": c.Name(), + } + success := false + + defer logAction( + "Creating empty container snapshot", + "Created empty container snapshot", + "Failed to create empty container snapshot", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + parentName, _, _ := containerGetParentAndSnapshotName(c.Name()) + snapshotMntPoint := getSnapshotMountPoint(c.Project(), s.pool.Name, parentName) + + err = os.MkdirAll(snapshotMntPoint, containersDirMode) + if err != nil { + return err + } + + err = s.driver.VolumeSnapshotCreate(c.Project(), "", c.Name(), driver.VolumeTypeContainerSnapshot) + if err != nil { + return err + } + + sourceName, _, _ := containerGetParentAndSnapshotName(c.Name()) + snapshotMntPointSymlinkTarget := getSnapshotMountPoint(c.Project(), s.pool.Name, sourceName) + snapshotMntPointSymlink := containerPath(c.Project(), sourceName, true) + + err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) ContainerSnapshotDelete(c container) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "container": c.Name(), + } + success := false + + defer logAction( + "Deleting container snapshot", + "Deleted container snapshot", + "Failed to delete container snapshot", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + sourceContainerName, _, _ := containerGetParentAndSnapshotName(c.Name()) + snapshotMntPoint := getSnapshotMountPoint(c.Project(), s.pool.Name, c.Name()) + snapshotSymlink := shared.VarPath("snapshots", projectPrefix(c.Project(), sourceContainerName)) + + err = s.driver.VolumeSnapshotDelete(c.Project(), c.Name(), true, driver.VolumeTypeContainerSnapshot) + if err != nil { + return err + } + + deleteSnapshotMountpoint(snapshotMntPoint, snapshotMntPoint, snapshotSymlink) + + if shared.PathExists(snapshotMntPoint) { + err := os.RemoveAll(snapshotMntPoint) + if err != nil { + return err + } + } + + snapshotContainerPath := getSnapshotMountPoint(c.Project(), s.pool.Name, sourceContainerName) + + empty, _ := shared.PathIsEmpty(snapshotContainerPath) + if empty { + err = os.Remove(snapshotContainerPath) + if err != nil { + return err + } + + snapshotSymlink := shared.VarPath("snapshots", projectPrefix(c.Project(), sourceContainerName)) + if shared.PathExists(snapshotSymlink) { + err = os.Remove(snapshotSymlink) + if err != nil { + return err + } + } + } + + success = true + + return nil +} + +func (s *Storage) ContainerSnapshotRename(c container, newName string) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "old_name": c.Name(), + "new_name": newName, + } + success := false + + defer logAction( + "Renaming container snapshot", + "Renamed container snapshot", + "Failed to rename container snapshot", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + oldSnapshotMntPoint := getSnapshotMountPoint(c.Project(), s.pool.Name, c.Name()) + newSnapshotMntPoint := getSnapshotMountPoint(c.Project(), s.pool.Name, newName) + + err = s.driver.VolumeSnapshotRename(c.Project(), c.Name(), newName, + driver.VolumeTypeContainerSnapshot) + if err != nil { + return err + } + + // It might be, that the driver already renamed the path. + if shared.PathExists(oldSnapshotMntPoint) { + err = os.Rename(oldSnapshotMntPoint, newSnapshotMntPoint) + if err != nil { + return err + } + } + + success = true + + return nil +} + +func (s *Storage) ContainerSnapshotStart(c container) (bool, error) { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "container": c.Name(), + } + success := false + + defer logAction( + "Starting container snapshot", + "Started container snapshot", + "Failed to start container snapshot", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return false, errors.Wrap(err, "Mount storage pool") + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + ok, err := s.driver.VolumeMount(c.Project(), c.Name(), driver.VolumeTypeContainerSnapshot) + if err != nil { + return ok, err + } + + success = true + + return ok, nil +} + +func (s *Storage) ContainerSnapshotStop(c container) (bool, error) { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "container": c.Name(), + } + success := false + + defer logAction( + "Stopping container snapshot", + "Stopped container snapshot", + "Failed to stop container snapshot", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return false, errors.Wrap(err, "Mount storage pool") + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + ok, err := s.driver.VolumeUmount(c.Project(), c.Name(), driver.VolumeTypeContainerSnapshot) + if err != nil { + return ok, err + } + + success = true + + return ok, nil +} + +func (s *Storage) ImageCreate(fingerprint string, tracker *ioprogress.ProgressTracker) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "fingerprint": fingerprint, + } + success := false + + defer logAction( + "Creating image", + "Created image", + "Failed to create image", + &ctx, &success, &err)() + + cleanupFunc := driver.LockImageCreate(s.pool.Name, fingerprint) + if cleanupFunc == nil { + return nil + } + defer cleanupFunc() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return errors.Wrap(err, "Mount storage pool") + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + // Don't create image if it already exists + if shared.PathExists(getImageMountPoint(s.pool.Name, fingerprint)) { + return nil + } + + err = s.createImageDbPoolVolume(fingerprint) + if err != nil { + return errors.Wrap(err, "Create image db pool volume") + } + + undo := true + defer func() { + if undo { + s.deleteImageDbPoolVolume(fingerprint) + } + }() + + imageSourcePath := shared.VarPath("images", fingerprint) + imageVolumePath := getImageMountPoint(s.pool.Name, "") + + if !shared.PathExists(imageVolumePath) { + err = os.MkdirAll(imageVolumePath, imagesDirMode) + if err != nil { + return errors.Wrap(err, "Create image mount point") + } + } + + volumeName := fingerprint + + if s.sType == storageTypeBtrfs { + volumeName = fmt.Sprintf("%s_tmp", fingerprint) + } + + imageTargetPath := getImageMountPoint(s.pool.Name, volumeName) + + err = s.driver.VolumeCreate("default", volumeName, driver.VolumeTypeImage) + if err != nil { + return errors.Wrap(err, "Create volume") + } + + if s.sType == storageTypeZfs { + undo = false + success = true + return nil + } + + if s.sType == storageTypeBtrfs { + defer func() { + s.driver.VolumeDelete("default", volumeName, false, driver.VolumeTypeImage) + }() + } + + err = unpackImage(imageSourcePath, imageTargetPath, s.sType, s.s.OS.RunningInUserNS, tracker) + if err != nil { + return errors.Wrap(err, "Unpack image") + } + + if s.sType == storageTypeBtrfs { + // Create read-only snapshot of the image volume + err = s.driver.VolumeSnapshotCreate("default", volumeName, + fingerprint, driver.VolumeTypeImageSnapshot) + if err != nil { + return errors.Wrap(err, "Create volume snapshot") + } + } + + undo = false + success = true + + return nil +} + +func (s *Storage) ImageDelete(fingerprint string) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "fingerprint": fingerprint, + } + success := false + + defer logAction( + "Deleting image", + "Deleted image", + "Failed to delete image", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if err != nil { + return err + } + + if ourMount { + defer s.driver.StoragePoolUmount() + } + + err = s.deleteImageDbPoolVolume(fingerprint) + if err != nil { + return err + } + + imageMntPoint := getImageMountPoint(s.pool.Name, fingerprint) + + err = s.driver.VolumeDelete("default", fingerprint, false, driver.VolumeTypeImage) + if err != nil { + return err + } + + // Now delete the mountpoint for the image: + // ${LXD_DIR}/images/. + if shared.PathExists(imageMntPoint) { + err := os.RemoveAll(imageMntPoint) + if err != nil && !os.IsNotExist(err) { + return err + } + } + + success = true + + return nil +} + +func (s *Storage) StorageEntitySetQuota(volumeType int, size int64, data interface{}) error { + if !shared.IntInSlice(volumeType, supportedVolumeTypes) { + return fmt.Errorf("Invalid storage type") + } + + var c container + var subvol string + var volType driver.VolumeType + + project := "default" + + switch volumeType { + case storagePoolVolumeTypeContainer: + c = data.(container) + subvol = c.Name() + volType = driver.VolumeTypeContainer + project = c.Project() + case storagePoolVolumeTypeCustom: + subvol = s.volume.Name + volType = driver.VolumeTypeCustom + } + + return s.driver.VolumeSetQuota(project, subvol, size, s.s.OS.RunningInUserNS, volType) +} + +func (s *Storage) ContainerBackupCreate(backup backup, source container) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "source": source.Name(), + } + success := false + + defer logAction( + "Creating container backup", + "Created container backup", + "Failed to create container backup", + &ctx, &success, &err)() + + // Start storage + ourStart, err := source.StorageStart() + if err != nil { + return err + } + + if ourStart { + defer source.StorageStop() + } + + // Create a temporary path for the backup + tmpPath, err := ioutil.TempDir(shared.VarPath("backups"), "lxd_backup_") + if err != nil { + return err + } + defer os.RemoveAll(tmpPath) + + var snapshots []string + + if !backup.containerOnly { + var snaps []container + + snaps, err = source.Snapshots() + if err != nil { + return err + } + + for _, snap := range snaps { + snapshots = append(snapshots, shared.ExtractSnapshotName(snap.Name())) + } + } + + err = s.driver.VolumeBackupCreate(tmpPath, source.Project(), source.Name(), snapshots, backup.optimizedStorage) + if err != nil { + return err + } + + // Pack the backup + err = backupCreateTarball(s.s, tmpPath, backup) + if err != nil { + return err + } + + success = true + + return nil +} + +func (s *Storage) ContainerBackupLoad(info backupInfo, data io.ReadSeeker, tarArgs []string) error { + var err error + ctx := log.Ctx{ + "driver": s.sTypeName, + "pool": s.pool.Name, + "backup": info, + } + success := false + + defer logAction( + "Loading container backup", + "Loaded container backup", + "Failed to load container backup", + &ctx, &success, &err)() + + ourMount, err := s.driver.StoragePoolMount() + if ourMount { + defer s.driver.StoragePoolUmount() + } + + if info.HasBinaryFormat { + containerName, _, _ := containerGetParentAndSnapshotName(info.Name) + containerMntPoint := getContainerMountPoint("default", s.pool.Name, "") + + /* + err := createContainerMountpoint(containerMntPoint, containerPath(info.Project, info.Name, false), info.Privileged) + if err != nil { + return err + } + */ + + var unpackDir string + + unpackDir, err = ioutil.TempDir(containerMntPoint, containerName) + if err != nil { + return err + } + // TODO: Check whether this is OK when using ZFS regarding the remove-mount-order. + // Alternatively, have the callee clean up the directory. + defer os.RemoveAll(unpackDir) + + err = os.Chmod(unpackDir, 0700) + if err != nil { + return err + } + + // ${LXD_DIR}/storage-pools//containers/.XXX/.backup_unpack + unpackPath := fmt.Sprintf("%s/.backup_unpack", unpackDir) + err = os.MkdirAll(unpackPath, 0711) + if err != nil { + return err + } + + // Prepare tar arguments + args := append(tarArgs, []string{ + "-", + "--strip-components=1", + "-C", unpackPath, "backup", + }...) + + // Extract container + data.Seek(0, 0) + err = shared.RunCommandWithFds(data, nil, "tar", args...) + if err != nil { + logger.Errorf("Failed to untar \"%s\" into \"%s\": %s", "backup", unpackPath, err) + return err + } + + err = s.driver.VolumeBackupLoad(unpackDir, info.Project, info.Name, + info.Snapshots, info.Privileged, info.HasBinaryFormat) + if err != nil { + return err + } + + _, err = s.driver.VolumeMount(info.Project, info.Name, driver.VolumeTypeContainer) + if err != nil { + return err + } + + success = true + + return nil + } + + containersPath := getContainerMountPoint("default", s.pool.Name, "") + + if !shared.PathExists(containersPath) { + err = os.MkdirAll(containersPath, containersDirMode) + if err != nil { + return err + } + } + + // create the main container + err = s.driver.VolumeCreate(info.Project, info.Name, + driver.VolumeTypeContainer) + if err != nil { + return err + } + + _, err = s.driver.VolumeMount(info.Project, info.Name, driver.VolumeTypeContainer) + if err != nil { + return err + } + + containerMntPoint := getContainerMountPoint(info.Project, s.pool.Name, info.Name) + + if s.sType != storageTypeZfs { + // Create the mountpoint for the container at: + // ${LXD_DIR}/containers/ + err = createContainerMountpoint(containerMntPoint, + containerPath(info.Project, info.Name, false), + info.Privileged) + if err != nil { + return err + } + } + + // Extract container + for _, snap := range info.Snapshots { + cur := fmt.Sprintf("backup/snapshots/%s", snap) + + // Prepare tar arguments + args := append(tarArgs, []string{ + "-", + "--recursive-unlink", + "--xattrs-include=*", + "--strip-components=3", + "-C", containerMntPoint, cur, + }...) + + // Extract snapshots + data.Seek(0, 0) + err = shared.RunCommandWithFds(data, nil, "tar", args...) + if err != nil { + logger.Errorf("Failed to untar \"%s\" into \"%s\": %s", cur, containerMntPoint, err) + return err + } + + // create snapshot + fullSnapshotName := fmt.Sprintf("%s/%s", info.Name, snap) + + snapshotPath := getSnapshotMountPoint(info.Project, s.pool.Name, info.Name) + if !shared.PathExists(snapshotPath) { + err = os.MkdirAll(snapshotPath, containersDirMode) + if err != nil { + return err + } + } + + snapshotMntPoint := getSnapshotMountPoint(info.Project, s.pool.Name, info.Name) + snapshotMntPointSymlink := shared.VarPath("snapshots", + projectPrefix(info.Project, info.Name)) + + err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPoint, snapshotMntPointSymlink) + if err != nil { + return err + } + + err = s.driver.VolumeSnapshotCreate(info.Project, info.Name, fullSnapshotName, + driver.VolumeTypeContainerSnapshot) + if err != nil { + return err + } + } + + // Prepare tar arguments + args := append(tarArgs, []string{ + "-", + "--strip-components=2", + "--xattrs-include=*", + "-C", containerMntPoint, "backup/container", + }...) + + // Extract container + data.Seek(0, 0) + err = shared.RunCommandWithFds(data, nil, "tar", args...) + if err != nil { + logger.Errorf("Failed to untar \"backup/container\" into \"%s\": %s", containerMntPoint, err) + return err + } + + success = true + + return nil +} + +func (s *Storage) MigrationType() migration.MigrationFSType { + switch s.sType { + case storageTypeBtrfs: + if !s.s.OS.RunningInUserNS { + return migration.MigrationFSType_BTRFS + } + case storageTypeZfs: + return migration.MigrationFSType_ZFS + } + + return migration.MigrationFSType_RSYNC +} + +func (s *Storage) PreservesInodes() bool { + switch s.sType { + case storageTypeBtrfs: + return !s.s.OS.RunningInUserNS + case storageTypeZfs: + return true + } + + // storageTypeDir, storageTypeLvm, storageTypeCeph, storageTypeMock + return false +} + +func (s *Storage) MigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { + switch s.sType { + case storageTypeBtrfs: + if s.s.OS.RunningInUserNS { + return rsyncMigrationSource(args) + } + + // Implement in the main package. Driver specific code needs to be exported + // and may not be part of the StorageDriver code. The reason for it being + // part of the main package is that it needs to be aware of containers, and + // reorganizing the container code will be a PITA. + return btrfsMigrationSource(args, s.pool) + case storageTypeDir: + return rsyncMigrationSource(args) + case storageTypeZfs: + return zfsMigrationSource(s.s, s.pool, args) + } + + return nil, nil +} + +func (s *Storage) MigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { + switch s.sType { + case storageTypeDir: + return rsyncMigrationSink(conn, op, args) + case storageTypeBtrfs: + if s.s.OS.RunningInUserNS { + return rsyncMigrationSink(conn, op, args) + } + + return btrfsMigrationSink(s.pool, conn, op, args) + case storageTypeZfs: + return zfsMigrationSink(s.pool, s.volume, conn, op, args) + } + + return nil +} + +func (s *Storage) StorageMigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { + return rsyncStorageMigrationSource(args) +} + +func (s *Storage) StorageMigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { + return rsyncStorageMigrationSink(conn, op, args) +} + + +func (s *Storage) createImageDbPoolVolume(fingerprint string) error { + // Fill in any default volume config. + volumeConfig := map[string]string{} + err := storageVolumeFillDefault(fingerprint, volumeConfig, s.pool) + if err != nil { + return err + } + + // Create a db entry for the storage volume of the image. + _, err = s.s.Cluster.StoragePoolVolumeCreate("default", fingerprint, "", storagePoolVolumeTypeImage, false, s.poolID, volumeConfig) + if err != nil { + // Try to delete the db entry on error. + s.deleteImageDbPoolVolume(fingerprint) + return err + } + + return nil +} + +func (s *Storage) deleteImageDbPoolVolume(fingerprint string) error { + err := s.s.Cluster.StoragePoolVolumeDelete("default", fingerprint, storagePoolVolumeTypeImage, s.poolID) + if err != nil { + return err + } + + return nil +} + +type StorageDriver interface { + Init() error + SharedInit(s *state.State, pool *api.StoragePool, poolID int64, volume *api.StorageVolume) + GetVersion() string + + StoragePoolCheck() error + StoragePoolCreate() error + StoragePoolDelete() error + StoragePoolMount() (bool, error) + StoragePoolUmount() (bool, error) + StoragePoolResources() (*api.ResourcesStoragePool, error) + StoragePoolUpdate(writable *api.StoragePoolPut, changedConfig []string) error + + VolumeCreate(project string, volumeName string, volumeType driver.VolumeType) error + VolumeCopy(project, source string, target string, snapshots []string, volumeType driver.VolumeType) error + VolumeDelete(project string, volumeName string, recursive bool, volumeType driver.VolumeType) error + VolumeRename(project string, oldName string, newName string, snapshots []string, volumeType driver.VolumeType) error + VolumeMount(project string, name string, volumeType driver.VolumeType) (bool, error) + VolumeUmount(project string, name string, volumeType driver.VolumeType) (bool, error) + VolumeGetUsage(project, name, path string) (int64, error) + VolumeSetQuota(project, name string, size int64, userns bool, volumeType driver.VolumeType) error + VolumeUpdate(writable *api.StorageVolumePut, changedConfig []string) error + VolumeReady(project string, name string) bool + VolumePrepareRestore(sourceName string, targetName string, targetSnapshots []string, f func() error) error + // TODO: remove in favour of VolumeSnapshotRestore, or remove VolumeSnapshotRestore in favour of this + VolumeRestore(project string, sourceName string, targetName string, volumeType driver.VolumeType) error + VolumeSnapshotCreate(project string, source string, target string, volumeType driver.VolumeType) error + VolumeSnapshotCopy(project, source string, target string, volumeType driver.VolumeType) error + VolumeSnapshotDelete(project string, volumeName string, recursive bool, volumeType driver.VolumeType) error + VolumeSnapshotRestore(project string, sourceName string, targetName string, volumeType driver.VolumeType) error + VolumeSnapshotRename(project string, oldName string, newName string, volumeType driver.VolumeType) error + VolumeBackupCreate(path string, project string, source string, snapshots []string, optimized bool) error + VolumeBackupLoad(backupDir string, project string, containerName string, snapshots []string, privileged bool, optimized bool) error +} + // The storage interface defines the functions needed to implement a storage // backend for a given storage driver. type storage interface { @@ -243,19 +2618,9 @@ func storageCoreInit(driver string) (storage, error) { switch sType { case storageTypeBtrfs: - btrfs := storageBtrfs{} - err = btrfs.StorageCoreInit() - if err != nil { - return nil, err - } - return &btrfs, nil + return storageCoreInit2(driver) case storageTypeDir: - dir := storageDir{} - err = dir.StorageCoreInit() - if err != nil { - return nil, err - } - return &dir, nil + return storageCoreInit2(driver) case storageTypeCeph: ceph := storageCeph{} err = ceph.StorageCoreInit() @@ -278,17 +2643,36 @@ func storageCoreInit(driver string) (storage, error) { } return &mock, nil case storageTypeZfs: - zfs := storageZfs{} - err = zfs.StorageCoreInit() - if err != nil { - return nil, err - } - return &zfs, nil + return storageCoreInit2(driver) } return nil, fmt.Errorf("invalid storage type") } +func storageCoreInit2(storageDriver string) (storage, error) { + sType, err := storageStringToType(storageDriver) + if err != nil { + return nil, err + } + + st := Storage{} + + switch sType { + case storageTypeDir: + st.driver = &driver.Dir{} + case storageTypeBtrfs: + st.driver = &driver.Btrfs{} + case storageTypeZfs: + st.driver = &driver.Zfs{} + default: + return nil, fmt.Errorf("invalid storage type") + } + + st.driver.Init() + + return &st, nil +} + func storageInit(s *state.State, project, poolName, volumeName string, volumeType int) (storage, error) { // Load the storage pool. poolID, pool, err := s.Cluster.StoragePoolGet(poolName) @@ -305,9 +2689,8 @@ func storageInit(s *state.State, project, poolName, volumeName string, volumeTyp // Load the storage volume. volume := &api.StorageVolume{} - volumeID := int64(-1) if volumeName != "" { - volumeID, volume, err = s.Cluster.StoragePoolNodeVolumeGetTypeByProject(project, volumeName, volumeType, poolID) + _, volume, err = s.Cluster.StoragePoolNodeVolumeGetTypeByProject(project, volumeName, volumeType, poolID) if err != nil { return nil, err } @@ -320,28 +2703,9 @@ func storageInit(s *state.State, project, poolName, volumeName string, volumeTyp switch sType { case storageTypeBtrfs: - btrfs := storageBtrfs{} - btrfs.poolID = poolID - btrfs.pool = pool - btrfs.volume = volume - btrfs.s = s - err = btrfs.StoragePoolInit() - if err != nil { - return nil, err - } - return &btrfs, nil + return storageInit2(s, project, poolName, volumeName, volumeType) case storageTypeDir: - dir := storageDir{} - dir.poolID = poolID - dir.pool = pool - dir.volume = volume - dir.volumeID = volumeID - dir.s = s - err = dir.StoragePoolInit() - if err != nil { - return nil, err - } - return &dir, nil + return storageInit2(s, project, poolName, volumeName, volumeType) case storageTypeCeph: ceph := storageCeph{} ceph.poolID = poolID @@ -376,19 +2740,68 @@ func storageInit(s *state.State, project, poolName, volumeName string, volumeTyp } return &mock, nil case storageTypeZfs: - zfs := storageZfs{} - zfs.poolID = poolID - zfs.pool = pool - zfs.volume = volume - zfs.s = s - err = zfs.StoragePoolInit() + return storageInit2(s, project, poolName, volumeName, volumeType) + } + + return nil, fmt.Errorf("invalid storage type") +} + +func storageInit2(s *state.State, project, poolName, volumeName string, volumeType int) (storage, error) { + // Load the storage pool. + poolID, pool, err := s.Cluster.StoragePoolGet(poolName) + if err != nil { + return nil, errors.Wrapf(err, "Load storage pool %q", poolName) + } + + if pool.Driver == "" { + // This shouldn't actually be possible but better safe than + // sorry. + return nil, fmt.Errorf("no storage driver was provided") + } + + // Load the storage volume. + volume := &api.StorageVolume{} + volumeID := int64(-1) + if volumeName != "" { + volumeID, volume, err = s.Cluster.StoragePoolNodeVolumeGetTypeByProject(project, volumeName, volumeType, poolID) if err != nil { return nil, err } - return &zfs, nil } - return nil, fmt.Errorf("invalid storage type") + sType, err := storageStringToType(pool.Driver) + if err != nil { + return nil, err + } + + st := Storage{} + st.poolID = poolID + st.pool = pool + st.volumeID = volumeID + st.volume = volume + st.s = s + st.sType = sType + st.sTypeName = pool.Driver + + switch sType { + case storageTypeDir: + st.driver = &driver.Dir{} + case storageTypeBtrfs: + st.driver = &driver.Btrfs{} + case storageTypeZfs: + st.driver = &driver.Zfs{} + default: + return nil, fmt.Errorf("invalid storage type") + } + + st.driver.SharedInit(s, pool, poolID, volume) + + err = st.driver.Init() + if err != nil { + return nil, err + } + + return &st, nil } func storagePoolInit(s *state.State, poolName string) (storage, error) { @@ -505,7 +2918,7 @@ func storagePoolVolumeAttachInit(s *state.State, poolName string, volumeName str var err error if st.GetStorageType() == storageTypeZfs { - err = lastIdmap.UnshiftRootfs(remapPath, zfsIdmapSetSkipper) + err = lastIdmap.UnshiftRootfs(remapPath, driver.ZfsIdmapSetSkipper) } else { err = lastIdmap.UnshiftRootfs(remapPath, nil) } @@ -521,7 +2934,7 @@ func storagePoolVolumeAttachInit(s *state.State, poolName string, volumeName str var err error if st.GetStorageType() == storageTypeZfs { - err = nextIdmap.ShiftRootfs(remapPath, zfsIdmapSetSkipper) + err = nextIdmap.ShiftRootfs(remapPath, driver.ZfsIdmapSetSkipper) } else { err = nextIdmap.ShiftRootfs(remapPath, nil) } From fa6c22745bb878c924099ca83397b8e50e952e13 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 16:51:53 +0200 Subject: [PATCH 13/15] lxd: Remove old btrfs storage code Signed-off-by: Thomas Hipp --- lxd/storage_btrfs.go | 3195 ------------------------------------------ 1 file changed, 3195 deletions(-) delete mode 100644 lxd/storage_btrfs.go diff --git a/lxd/storage_btrfs.go b/lxd/storage_btrfs.go deleted file mode 100644 index 7fcab12506..0000000000 --- a/lxd/storage_btrfs.go +++ /dev/null @@ -1,3195 +0,0 @@ -package main - -import ( - "fmt" - "io" - "io/ioutil" - "os" - "os/exec" - "path" - "path/filepath" - "sort" - "strconv" - "strings" - "syscall" - - "github.com/gorilla/websocket" - "github.com/pkg/errors" - - "github.com/lxc/lxd/lxd/db" - "github.com/lxc/lxd/lxd/migration" - driver "github.com/lxc/lxd/lxd/storage" - "github.com/lxc/lxd/lxd/util" - "github.com/lxc/lxd/shared" - "github.com/lxc/lxd/shared/api" - "github.com/lxc/lxd/shared/ioprogress" - "github.com/lxc/lxd/shared/logger" -) - -type storageBtrfs struct { - remount uintptr - storageShared -} - -var btrfsVersion = "" - -func (s *storageBtrfs) getBtrfsMountOptions() string { - if s.pool.Config["btrfs.mount_options"] != "" { - return s.pool.Config["btrfs.mount_options"] - } - - return "user_subvol_rm_allowed" -} - -func (s *storageBtrfs) setBtrfsMountOptions(mountOptions string) { - s.pool.Config["btrfs.mount_options"] = mountOptions -} - -// ${LXD_DIR}/storage-pools//containers -func (s *storageBtrfs) getContainerSubvolumePath(poolName string) string { - return shared.VarPath("storage-pools", poolName, "containers") -} - -// ${LXD_DIR}/storage-pools//containers-snapshots -func getSnapshotSubvolumePath(project, poolName string, containerName string) string { - return shared.VarPath("storage-pools", poolName, "containers-snapshots", projectPrefix(project, containerName)) -} - -// ${LXD_DIR}/storage-pools//images -func (s *storageBtrfs) getImageSubvolumePath(poolName string) string { - return shared.VarPath("storage-pools", poolName, "images") -} - -// ${LXD_DIR}/storage-pools//custom -func (s *storageBtrfs) getCustomSubvolumePath(poolName string) string { - return shared.VarPath("storage-pools", poolName, "custom") -} - -// ${LXD_DIR}/storage-pools//custom-snapshots -func (s *storageBtrfs) getCustomSnapshotSubvolumePath(poolName string) string { - return shared.VarPath("storage-pools", poolName, "custom-snapshots") -} - -func (s *storageBtrfs) StorageCoreInit() error { - s.sType = storageTypeBtrfs - typeName, err := storageTypeToString(s.sType) - if err != nil { - return err - } - s.sTypeName = typeName - - if btrfsVersion != "" { - s.sTypeVersion = btrfsVersion - return nil - } - - out, err := exec.LookPath("btrfs") - if err != nil || len(out) == 0 { - return fmt.Errorf("The 'btrfs' tool isn't available") - } - - output, err := shared.RunCommand("btrfs", "version") - if err != nil { - return fmt.Errorf("The 'btrfs' tool isn't working properly") - } - - count, err := fmt.Sscanf(strings.SplitN(output, " ", 2)[1], "v%s\n", &s.sTypeVersion) - if err != nil || count != 1 { - return fmt.Errorf("The 'btrfs' tool isn't working properly") - } - - btrfsVersion = s.sTypeVersion - - return nil -} - -func (s *storageBtrfs) StoragePoolInit() error { - err := s.StorageCoreInit() - if err != nil { - return err - } - - return nil -} - -func (s *storageBtrfs) StoragePoolCheck() error { - // FIXEM(brauner): Think of something smart or useful (And then think - // again if it is worth implementing it. :)). - logger.Debugf("Checking BTRFS storage pool \"%s\"", s.pool.Name) - return nil -} - -func (s *storageBtrfs) StoragePoolCreate() error { - logger.Infof("Creating BTRFS storage pool \"%s\"", s.pool.Name) - s.pool.Config["volatile.initial_source"] = s.pool.Config["source"] - - isBlockDev := false - - source := s.pool.Config["source"] - if strings.HasPrefix(source, "/") { - source = shared.HostPath(s.pool.Config["source"]) - } - - defaultSource := filepath.Join(shared.VarPath("disks"), fmt.Sprintf("%s.img", s.pool.Name)) - if source == "" || source == defaultSource { - source = defaultSource - s.pool.Config["source"] = source - - f, err := os.Create(source) - if err != nil { - return fmt.Errorf("Failed to open %s: %s", source, err) - } - defer f.Close() - - err = f.Chmod(0600) - if err != nil { - return fmt.Errorf("Failed to chmod %s: %s", source, err) - } - - size, err := shared.ParseByteSizeString(s.pool.Config["size"]) - if err != nil { - return err - } - err = f.Truncate(size) - if err != nil { - return fmt.Errorf("Failed to create sparse file %s: %s", source, err) - } - - output, err := makeFSType(source, "btrfs", &mkfsOptions{label: s.pool.Name}) - if err != nil { - return fmt.Errorf("Failed to create the BTRFS pool: %s", output) - } - } else { - // Unset size property since it doesn't make sense. - s.pool.Config["size"] = "" - - if filepath.IsAbs(source) { - isBlockDev = shared.IsBlockdevPath(source) - if isBlockDev { - output, err := makeFSType(source, "btrfs", &mkfsOptions{label: s.pool.Name}) - if err != nil { - return fmt.Errorf("Failed to create the BTRFS pool: %s", output) - } - } else { - if isBtrfsSubVolume(source) { - subvols, err := btrfsSubVolumesGet(source) - if err != nil { - return fmt.Errorf("Could not determine if existing BTRFS subvolume ist empty: %s", err) - } - if len(subvols) > 0 { - return fmt.Errorf("Requested BTRFS subvolume exists but is not empty") - } - } else { - cleanSource := filepath.Clean(source) - lxdDir := shared.VarPath() - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - if shared.PathExists(source) && !isOnBtrfs(source) { - return fmt.Errorf("Existing path is neither a BTRFS subvolume nor does it reside on a BTRFS filesystem") - } else if strings.HasPrefix(cleanSource, lxdDir) { - if cleanSource != poolMntPoint { - return fmt.Errorf("BTRFS subvolumes requests in LXD directory \"%s\" are only valid under \"%s\"\n(e.g. source=%s)", shared.VarPath(), shared.VarPath("storage-pools"), poolMntPoint) - } else if s.s.OS.BackingFS != "btrfs" { - return fmt.Errorf("Creation of BTRFS subvolume requested but \"%s\" does not reside on BTRFS filesystem", source) - } - } - - err := btrfsSubVolumeCreate(source) - if err != nil { - return err - } - } - } - } else { - return fmt.Errorf("Invalid \"source\" property") - } - } - - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - if !shared.PathExists(poolMntPoint) { - err := os.MkdirAll(poolMntPoint, storagePoolsDirMode) - if err != nil { - return err - } - } - - var err1 error - var devUUID string - mountFlags, mountOptions := lxdResolveMountoptions(s.getBtrfsMountOptions()) - mountFlags |= s.remount - if isBlockDev && filepath.IsAbs(source) { - devUUID, _ = shared.LookupUUIDByBlockDevPath(source) - // The symlink might not have been created even with the delay - // we granted it above. So try to call btrfs filesystem show and - // parse it out. (I __hate__ this!) - if devUUID == "" { - logger.Warnf("Failed to detect UUID by looking at /dev/disk/by-uuid") - devUUID, err1 = s.btrfsLookupFsUUID(source) - if err1 != nil { - logger.Errorf("Failed to detect UUID by parsing filesystem info") - return err1 - } - } - s.pool.Config["source"] = devUUID - - // If the symlink in /dev/disk/by-uuid hasn't been created yet - // aka we only detected it by parsing btrfs filesystem show, we - // cannot call StoragePoolMount() since it will try to do the - // reverse operation. So instead we shamelessly mount using the - // block device path at the time of pool creation. - err1 = syscall.Mount(source, poolMntPoint, "btrfs", mountFlags, mountOptions) - } else { - _, err1 = s.StoragePoolMount() - } - if err1 != nil { - return err1 - } - - // Create default subvolumes. - dummyDir := getContainerMountPoint("default", s.pool.Name, "") - err := btrfsSubVolumeCreate(dummyDir) - if err != nil { - return fmt.Errorf("Could not create btrfs subvolume: %s", dummyDir) - } - - dummyDir = getSnapshotMountPoint("default", s.pool.Name, "") - err = btrfsSubVolumeCreate(dummyDir) - if err != nil { - return fmt.Errorf("Could not create btrfs subvolume: %s", dummyDir) - } - - dummyDir = getImageMountPoint(s.pool.Name, "") - err = btrfsSubVolumeCreate(dummyDir) - if err != nil { - return fmt.Errorf("Could not create btrfs subvolume: %s", dummyDir) - } - - dummyDir = getStoragePoolVolumeMountPoint(s.pool.Name, "") - err = btrfsSubVolumeCreate(dummyDir) - if err != nil { - return fmt.Errorf("Could not create btrfs subvolume: %s", dummyDir) - } - - dummyDir = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, "") - err = btrfsSubVolumeCreate(dummyDir) - if err != nil { - return fmt.Errorf("Could not create btrfs subvolume: %s", dummyDir) - } - - err = s.StoragePoolCheck() - if err != nil { - return err - } - - logger.Infof("Created BTRFS storage pool \"%s\"", s.pool.Name) - return nil -} - -func (s *storageBtrfs) StoragePoolDelete() error { - logger.Infof("Deleting BTRFS storage pool \"%s\"", s.pool.Name) - - source := s.pool.Config["source"] - if strings.HasPrefix(source, "/") { - source = shared.HostPath(s.pool.Config["source"]) - } - - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - // Delete default subvolumes. - dummyDir := getContainerMountPoint("default", s.pool.Name, "") - btrfsSubVolumesDelete(dummyDir) - - dummyDir = getSnapshotMountPoint("default", s.pool.Name, "") - btrfsSubVolumesDelete(dummyDir) - - dummyDir = getImageMountPoint(s.pool.Name, "") - btrfsSubVolumesDelete(dummyDir) - - dummyDir = getStoragePoolVolumeMountPoint(s.pool.Name, "") - btrfsSubVolumesDelete(dummyDir) - - _, err := s.StoragePoolUmount() - if err != nil { - return err - } - - // This is a UUID. Check whether we can find the block device. - if !filepath.IsAbs(source) { - // Try to lookup the disk device by UUID but don't fail. If we - // don't find one this might just mean we have been given the - // UUID of a subvolume. - byUUID := fmt.Sprintf("/dev/disk/by-uuid/%s", source) - diskPath, err := os.Readlink(byUUID) - msg := "" - if err == nil { - msg = fmt.Sprintf("Removing disk device %s with UUID: %s.", diskPath, source) - } else { - msg = fmt.Sprintf("Failed to lookup disk device with UUID: %s: %s.", source, err) - } - logger.Debugf(msg) - } else { - var err error - cleanSource := filepath.Clean(source) - sourcePath := shared.VarPath("disks", s.pool.Name) - loopFilePath := sourcePath + ".img" - if cleanSource == loopFilePath { - // This is a loop file so simply remove it. - err = os.Remove(source) - } else { - if !isBtrfsFilesystem(source) && isBtrfsSubVolume(source) { - err = btrfsSubVolumesDelete(source) - } - } - if err != nil && !os.IsNotExist(err) { - return err - } - } - - // Remove the mountpoint for the storage pool. - err = os.RemoveAll(getStoragePoolMountPoint(s.pool.Name)) - if err != nil && !os.IsNotExist(err) { - return err - } - - logger.Infof("Deleted BTRFS storage pool \"%s\"", s.pool.Name) - return nil -} - -func (s *storageBtrfs) StoragePoolMount() (bool, error) { - logger.Debugf("Mounting BTRFS storage pool \"%s\"", s.pool.Name) - - source := s.pool.Config["source"] - if strings.HasPrefix(source, "/") { - source = shared.HostPath(s.pool.Config["source"]) - } - - if source == "" { - return false, fmt.Errorf("no \"source\" property found for the storage pool") - } - - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - - poolMountLockID := getPoolMountLockID(s.pool.Name) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[poolMountLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - // Give the benefit of the doubt and assume that the other - // thread actually succeeded in mounting the storage pool. - return false, nil - } - - lxdStorageOngoingOperationMap[poolMountLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - removeLockFromMap := func() { - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[poolMountLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, poolMountLockID) - } - lxdStorageMapLock.Unlock() - } - defer removeLockFromMap() - - // Check whether the mount poolMntPoint exits. - if !shared.PathExists(poolMntPoint) { - err := os.MkdirAll(poolMntPoint, storagePoolsDirMode) - if err != nil { - return false, err - } - } - - if shared.IsMountPoint(poolMntPoint) && (s.remount&syscall.MS_REMOUNT) == 0 { - return false, nil - } - - mountFlags, mountOptions := lxdResolveMountoptions(s.getBtrfsMountOptions()) - mountSource := source - isBlockDev := shared.IsBlockdevPath(source) - if filepath.IsAbs(source) { - cleanSource := filepath.Clean(source) - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - loopFilePath := shared.VarPath("disks", s.pool.Name+".img") - if !isBlockDev && cleanSource == loopFilePath { - // If source == "${LXD_DIR}"/disks/{pool_name} it is a - // loop file we're dealing with. - // - // Since we mount the loop device LO_FLAGS_AUTOCLEAR is - // fine since the loop device will be kept around for as - // long as the mount exists. - loopF, loopErr := driver.PrepareLoopDev(source, driver.LoFlagsAutoclear) - if loopErr != nil { - return false, loopErr - } - mountSource = loopF.Name() - defer loopF.Close() - } else if !isBlockDev && cleanSource != poolMntPoint { - mountSource = source - mountFlags |= syscall.MS_BIND - } else if !isBlockDev && cleanSource == poolMntPoint && s.s.OS.BackingFS == "btrfs" { - return false, nil - } - // User is using block device path. - } else { - // Try to lookup the disk device by UUID but don't fail. If we - // don't find one this might just mean we have been given the - // UUID of a subvolume. - byUUID := fmt.Sprintf("/dev/disk/by-uuid/%s", source) - diskPath, err := os.Readlink(byUUID) - if err == nil { - mountSource = fmt.Sprintf("/dev/%s", strings.Trim(diskPath, "../../")) - } else { - // We have very likely been given a subvolume UUID. In - // this case we should simply assume that the user has - // mounted the parent of the subvolume or the subvolume - // itself. Otherwise this becomes a really messy - // detection task. - return false, nil - } - } - - mountFlags |= s.remount - err := syscall.Mount(mountSource, poolMntPoint, "btrfs", mountFlags, mountOptions) - if err != nil { - logger.Errorf("Failed to mount BTRFS storage pool \"%s\" onto \"%s\" with mountoptions \"%s\": %s", mountSource, poolMntPoint, mountOptions, err) - return false, err - } - - logger.Debugf("Mounted BTRFS storage pool \"%s\"", s.pool.Name) - return true, nil -} - -func (s *storageBtrfs) StoragePoolUmount() (bool, error) { - logger.Debugf("Unmounting BTRFS storage pool \"%s\"", s.pool.Name) - - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - - poolUmountLockID := getPoolUmountLockID(s.pool.Name) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[poolUmountLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - // Give the benefit of the doubt and assume that the other - // thread actually succeeded in unmounting the storage pool. - return false, nil - } - - lxdStorageOngoingOperationMap[poolUmountLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - removeLockFromMap := func() { - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[poolUmountLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, poolUmountLockID) - } - lxdStorageMapLock.Unlock() - } - - defer removeLockFromMap() - - if shared.IsMountPoint(poolMntPoint) { - err := syscall.Unmount(poolMntPoint, 0) - if err != nil { - return false, err - } - } - - logger.Debugf("Unmounted BTRFS storage pool \"%s\"", s.pool.Name) - return true, nil -} - -func (s *storageBtrfs) StoragePoolUpdate(writable *api.StoragePoolPut, - changedConfig []string) error { - logger.Infof(`Updating BTRFS storage pool "%s"`, s.pool.Name) - - changeable := changeableStoragePoolProperties["btrfs"] - unchangeable := []string{} - for _, change := range changedConfig { - if !shared.StringInSlice(change, changeable) { - unchangeable = append(unchangeable, change) - } - } - - if len(unchangeable) > 0 { - return updateStoragePoolError(unchangeable, "btrfs") - } - - // "rsync.bwlimit" requires no on-disk modifications. - - if shared.StringInSlice("btrfs.mount_options", changedConfig) { - s.setBtrfsMountOptions(writable.Config["btrfs.mount_options"]) - s.remount |= syscall.MS_REMOUNT - _, err := s.StoragePoolMount() - if err != nil { - return err - } - } - - logger.Infof(`Updated BTRFS storage pool "%s"`, s.pool.Name) - return nil -} - -func (s *storageBtrfs) GetContainerPoolInfo() (int64, string, string) { - return s.poolID, s.pool.Name, s.pool.Name -} - -// Functions dealing with storage volumes. -func (s *storageBtrfs) StoragePoolVolumeCreate() error { - logger.Infof("Creating BTRFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - isSnapshot := shared.IsSnapshot(s.volume.Name) - - // Create subvolume path on the storage pool. - var customSubvolumePath string - - if isSnapshot { - customSubvolumePath = s.getCustomSnapshotSubvolumePath(s.pool.Name) - } else { - customSubvolumePath = s.getCustomSubvolumePath(s.pool.Name) - } - - if !shared.PathExists(customSubvolumePath) { - err := os.MkdirAll(customSubvolumePath, 0700) - if err != nil { - return err - } - } - - // Create subvolume. - var customSubvolumeName string - - if isSnapshot { - customSubvolumeName = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - } else { - customSubvolumeName = getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - } - - err = btrfsSubVolumeCreate(customSubvolumeName) - if err != nil { - return err - } - - // apply quota - if s.volume.Config["size"] != "" { - size, err := shared.ParseByteSizeString(s.volume.Config["size"]) - if err != nil { - return err - } - - err = s.StorageEntitySetQuota(storagePoolVolumeTypeCustom, size, nil) - if err != nil { - return err - } - } - - logger.Infof("Created BTRFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) StoragePoolVolumeDelete() error { - logger.Infof("Deleting BTRFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Delete subvolume. - customSubvolumeName := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - if shared.PathExists(customSubvolumeName) && isBtrfsSubVolume(customSubvolumeName) { - err = btrfsSubVolumesDelete(customSubvolumeName) - if err != nil { - return err - } - } - - // Delete the mountpoint. - if shared.PathExists(customSubvolumeName) { - err = os.Remove(customSubvolumeName) - if err != nil { - return err - } - } - - err = s.s.Cluster.StoragePoolVolumeDelete( - "default", - s.volume.Name, - storagePoolVolumeTypeCustom, - s.poolID) - if err != nil { - logger.Errorf(`Failed to delete database entry for BTRFS storage volume "%s" on storage pool "%s"`, s.volume.Name, s.pool.Name) - } - - logger.Infof("Deleted BTRFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) StoragePoolVolumeMount() (bool, error) { - logger.Debugf("Mounting BTRFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - // The storage pool must be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return false, err - } - - logger.Debugf("Mounted BTRFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return true, nil -} - -func (s *storageBtrfs) StoragePoolVolumeUmount() (bool, error) { - return true, nil -} - -func (s *storageBtrfs) StoragePoolVolumeUpdate(writable *api.StorageVolumePut, changedConfig []string) error { - if writable.Restore != "" { - logger.Debugf(`Restoring BTRFS storage volume "%s" from snapshot "%s"`, - s.volume.Name, writable.Restore) - - // The storage pool must be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Create a backup so we can revert. - targetVolumeSubvolumeName := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - backupTargetVolumeSubvolumeName := fmt.Sprintf("%s.tmp", targetVolumeSubvolumeName) - err = os.Rename(targetVolumeSubvolumeName, backupTargetVolumeSubvolumeName) - if err != nil { - return err - } - undo := true - defer func() { - if undo { - os.Rename(backupTargetVolumeSubvolumeName, targetVolumeSubvolumeName) - } - }() - - sourceVolumeSubvolumeName := getStoragePoolVolumeSnapshotMountPoint( - s.pool.Name, fmt.Sprintf("%s/%s", s.volume.Name, writable.Restore)) - err = s.btrfsPoolVolumesSnapshot(sourceVolumeSubvolumeName, - targetVolumeSubvolumeName, false, true) - if err != nil { - return err - } - - undo = false - err = btrfsSubVolumesDelete(backupTargetVolumeSubvolumeName) - if err != nil { - return err - } - - logger.Debugf(`Restored BTRFS storage volume "%s" from snapshot "%s"`, - s.volume.Name, writable.Restore) - return nil - } - - logger.Infof(`Updating BTRFS storage volume "%s"`, s.volume.Name) - - changeable := changeableStoragePoolVolumeProperties["btrfs"] - unchangeable := []string{} - for _, change := range changedConfig { - if !shared.StringInSlice(change, changeable) { - unchangeable = append(unchangeable, change) - } - } - - if len(unchangeable) > 0 { - return updateStoragePoolVolumeError(unchangeable, "btrfs") - } - - if shared.StringInSlice("size", changedConfig) { - if s.volume.Type != storagePoolVolumeTypeNameCustom { - return updateStoragePoolVolumeError([]string{"size"}, "btrfs") - } - - if s.volume.Config["size"] != writable.Config["size"] { - size, err := shared.ParseByteSizeString(writable.Config["size"]) - if err != nil { - return err - } - - err = s.StorageEntitySetQuota(storagePoolVolumeTypeCustom, size, nil) - if err != nil { - return err - } - } - } - - logger.Infof(`Updated BTRFS storage volume "%s"`, s.volume.Name) - return nil -} - -func (s *storageBtrfs) StoragePoolVolumeRename(newName string) error { - logger.Infof(`Renaming BTRFS storage volume on storage pool "%s" from "%s" to "%s`, - s.pool.Name, s.volume.Name, newName) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - usedBy, err := storagePoolVolumeUsedByContainersGet(s.s, "default", s.volume.Name, storagePoolVolumeTypeNameCustom) - if err != nil { - return err - } - if len(usedBy) > 0 { - return fmt.Errorf(`BTRFS storage volume "%s" on storage pool "%s" is attached to containers`, - s.volume.Name, s.pool.Name) - } - - oldPath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - newPath := getStoragePoolVolumeMountPoint(s.pool.Name, newName) - err = os.Rename(oldPath, newPath) - if err != nil { - return err - } - - logger.Infof(`Renamed BTRFS storage volume on storage pool "%s" from "%s" to "%s`, - s.pool.Name, s.volume.Name, newName) - - err = s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, newName, - storagePoolVolumeTypeCustom, s.poolID) - if err != nil { - return err - } - - // Get volumes attached to source storage volume - volumes, err := s.s.Cluster.StoragePoolVolumeSnapshotsGetType(s.volume.Name, - storagePoolVolumeTypeCustom, s.poolID) - if err != nil { - return err - } - - for _, vol := range volumes { - _, snapshotName, _ := containerGetParentAndSnapshotName(vol) - oldVolumeName := fmt.Sprintf("%s%s%s", s.volume.Name, shared.SnapshotDelimiter, snapshotName) - newVolumeName := fmt.Sprintf("%s%s%s", newName, shared.SnapshotDelimiter, snapshotName) - - // Rename volume snapshots - oldPath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, oldVolumeName) - newPath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, newVolumeName) - err = os.Rename(oldPath, newPath) - if err != nil { - return err - } - - err = s.s.Cluster.StoragePoolVolumeRename("default", oldVolumeName, newVolumeName, - storagePoolVolumeTypeCustom, s.poolID) - if err != nil { - return nil - } - } - - return nil -} - -// Functions dealing with container storage. -func (s *storageBtrfs) ContainerStorageReady(container container) bool { - containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - return isBtrfsSubVolume(containerMntPoint) -} - -func (s *storageBtrfs) doContainerCreate(project, name string, privileged bool) error { - logger.Debugf("Creating empty BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // We can only create the btrfs subvolume under the mounted storage - // pool. The on-disk layout for containers on a btrfs storage pool will - // thus be - // ${LXD_DIR}/storage-pools//containers/. The btrfs tool will - // complain if the intermediate path does not exist, so create it if it - // doesn't already. - containerSubvolumePath := s.getContainerSubvolumePath(s.pool.Name) - if !shared.PathExists(containerSubvolumePath) { - err := os.MkdirAll(containerSubvolumePath, containersDirMode) - if err != nil { - return err - } - } - - // Create empty subvolume for container. - containerSubvolumeName := getContainerMountPoint(project, s.pool.Name, name) - err = btrfsSubVolumeCreate(containerSubvolumeName) - if err != nil { - return err - } - - // Create the mountpoint for the container at: - // ${LXD_DIR}/containers/ - err = createContainerMountpoint(containerSubvolumeName, shared.VarPath("containers", projectPrefix(project, name)), privileged) - if err != nil { - return err - } - - logger.Debugf("Created empty BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) ContainerCreate(container container) error { - err := s.doContainerCreate(container.Project(), container.Name(), container.IsPrivileged()) - if err != nil { - return err - } - - return container.TemplateApply("create") -} - -// And this function is why I started hating on btrfs... -func (s *storageBtrfs) ContainerCreateFromImage(container container, fingerprint string, tracker *ioprogress.ProgressTracker) error { - logger.Debugf("Creating BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - _, err := s.StoragePoolMount() - if err != nil { - return errors.Wrap(err, "Failed to mount storage pool") - } - - // We can only create the btrfs subvolume under the mounted storage - // pool. The on-disk layout for containers on a btrfs storage pool will - // thus be - // ${LXD_DIR}/storage-pools//containers/. The btrfs tool will - // complain if the intermediate path does not exist, so create it if it - // doesn't already. - containerSubvolumePath := s.getContainerSubvolumePath(s.pool.Name) - if !shared.PathExists(containerSubvolumePath) { - err := os.MkdirAll(containerSubvolumePath, containersDirMode) - if err != nil { - return errors.Wrap(err, "Failed to create volume directory") - } - } - - // Mountpoint of the image: - // ${LXD_DIR}/images/ - imageMntPoint := getImageMountPoint(s.pool.Name, fingerprint) - imageStoragePoolLockID := getImageCreateLockID(s.pool.Name, fingerprint) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[imageStoragePoolLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - } else { - lxdStorageOngoingOperationMap[imageStoragePoolLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - var imgerr error - if !shared.PathExists(imageMntPoint) || !isBtrfsSubVolume(imageMntPoint) { - imgerr = s.ImageCreate(fingerprint, tracker) - } - - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[imageStoragePoolLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, imageStoragePoolLockID) - } - lxdStorageMapLock.Unlock() - - if imgerr != nil { - return errors.Wrap(imgerr, "Failed to create image volume") - } - } - - // Create a rw snapshot at - // ${LXD_DIR}/storage-pools//containers/ - // from the mounted ro image snapshot mounted at - // ${LXD_DIR}/storage-pools//images/ - containerSubvolumeName := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - err = s.btrfsPoolVolumesSnapshot(imageMntPoint, containerSubvolumeName, false, false) - if err != nil { - return errors.Wrap(err, "Failed to storage pool volume snapshot") - } - - // Create the mountpoint for the container at: - // ${LXD_DIR}/containers/ - err = createContainerMountpoint(containerSubvolumeName, container.Path(), container.IsPrivileged()) - if err != nil { - return errors.Wrap(err, "Failed to create container mountpoint") - } - - logger.Debugf("Created BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - err = container.TemplateApply("create") - if err != nil { - return errors.Wrap(err, "Failed to apply container template") - } - return nil -} - -func (s *storageBtrfs) ContainerDelete(container container) error { - logger.Debugf("Deleting BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - // The storage pool needs to be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Delete the subvolume. - containerSubvolumeName := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - if shared.PathExists(containerSubvolumeName) && isBtrfsSubVolume(containerSubvolumeName) { - err = btrfsSubVolumesDelete(containerSubvolumeName) - if err != nil { - return err - } - } - - // Delete the container's symlink to the subvolume. - err = deleteContainerMountpoint(containerSubvolumeName, container.Path(), s.GetStorageTypeName()) - if err != nil { - return err - } - - // Delete potential snapshot mountpoints. - snapshotMntPoint := getSnapshotMountPoint(container.Project(), s.pool.Name, container.Name()) - if shared.PathExists(snapshotMntPoint) { - err := os.RemoveAll(snapshotMntPoint) - if err != nil && !os.IsNotExist(err) { - return err - } - } - - // Delete potential symlink - // ${LXD_DIR}/snapshots/ to ${POOL}/snapshots/ - snapshotSymlink := shared.VarPath("snapshots", projectPrefix(container.Project(), container.Name())) - if shared.PathExists(snapshotSymlink) { - err := os.Remove(snapshotSymlink) - if err != nil { - return err - } - } - - logger.Debugf("Deleted BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) copyContainer(target container, source container) error { - sourceContainerSubvolumeName := getContainerMountPoint(source.Project(), s.pool.Name, source.Name()) - if source.IsSnapshot() { - sourceContainerSubvolumeName = getSnapshotMountPoint(source.Project(), s.pool.Name, source.Name()) - } - targetContainerSubvolumeName := getContainerMountPoint(target.Project(), s.pool.Name, target.Name()) - - containersPath := getContainerMountPoint("default", s.pool.Name, "") - // Ensure that the directories immediately preceding the subvolume directory exist. - if !shared.PathExists(containersPath) { - err := os.MkdirAll(containersPath, containersDirMode) - if err != nil { - return err - } - } - - err := s.btrfsPoolVolumesSnapshot(sourceContainerSubvolumeName, targetContainerSubvolumeName, false, true) - if err != nil { - return err - } - - err = createContainerMountpoint(targetContainerSubvolumeName, target.Path(), target.IsPrivileged()) - if err != nil { - return err - } - - err = target.TemplateApply("copy") - if err != nil { - return err - } - - return nil -} - -func (s *storageBtrfs) copySnapshot(target container, source container) error { - sourceName := source.Name() - targetName := target.Name() - sourceContainerSubvolumeName := getSnapshotMountPoint(source.Project(), s.pool.Name, sourceName) - targetContainerSubvolumeName := getSnapshotMountPoint(target.Project(), s.pool.Name, targetName) - - targetParentName, _, _ := containerGetParentAndSnapshotName(target.Name()) - containersPath := getSnapshotMountPoint(target.Project(), s.pool.Name, targetParentName) - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(target.Project(), targetParentName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(target.Project(), targetParentName)) - err := createSnapshotMountpoint(containersPath, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - - // Ensure that the directories immediately preceding the subvolume directory exist. - if !shared.PathExists(containersPath) { - err := os.MkdirAll(containersPath, containersDirMode) - if err != nil { - return err - } - } - - err = s.btrfsPoolVolumesSnapshot(sourceContainerSubvolumeName, targetContainerSubvolumeName, false, true) - if err != nil { - return err - } - - return nil -} - -func (s *storageBtrfs) doCrossPoolContainerCopy(target container, source container, containerOnly bool, refresh bool, refreshSnapshots []container) error { - sourcePool, err := source.StoragePool() - if err != nil { - return err - } - - // setup storage for the source volume - srcStorage, err := storagePoolVolumeInit(s.s, "default", sourcePool, source.Name(), storagePoolVolumeTypeContainer) - if err != nil { - return err - } - - ourMount, err := srcStorage.StoragePoolMount() - if err != nil { - return err - } - if ourMount { - defer srcStorage.StoragePoolUmount() - } - - targetPool, err := target.StoragePool() - if err != nil { - return err - } - - var snapshots []container - - if refresh { - snapshots = refreshSnapshots - } else { - snapshots, err = source.Snapshots() - if err != nil { - return err - } - - // create the main container - err = s.doContainerCreate(target.Project(), target.Name(), target.IsPrivileged()) - if err != nil { - return err - } - } - - destContainerMntPoint := getContainerMountPoint(target.Project(), targetPool, target.Name()) - bwlimit := s.pool.Config["rsync.bwlimit"] - if !containerOnly { - for _, snap := range snapshots { - srcSnapshotMntPoint := getSnapshotMountPoint(target.Project(), sourcePool, snap.Name()) - _, err = rsyncLocalCopy(srcSnapshotMntPoint, destContainerMntPoint, bwlimit) - if err != nil { - logger.Errorf("Failed to rsync into BTRFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - // create snapshot - _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap.Name()) - err = s.doContainerSnapshotCreate(target.Project(), fmt.Sprintf("%s/%s", target.Name(), snapOnlyName), target.Name()) - if err != nil { - return err - } - } - } - - srcContainerMntPoint := getContainerMountPoint(source.Project(), sourcePool, source.Name()) - _, err = rsyncLocalCopy(srcContainerMntPoint, destContainerMntPoint, bwlimit) - if err != nil { - logger.Errorf("Failed to rsync into BTRFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - return nil -} - -func (s *storageBtrfs) ContainerCopy(target container, source container, containerOnly bool) error { - logger.Debugf("Copying BTRFS container storage %s to %s", source.Name(), target.Name()) - - // The storage pool needs to be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - ourStart, err := source.StorageStart() - if err != nil { - return err - } - if ourStart { - defer source.StorageStop() - } - - _, sourcePool, _ := source.Storage().GetContainerPoolInfo() - _, targetPool, _ := target.Storage().GetContainerPoolInfo() - if sourcePool != targetPool { - return s.doCrossPoolContainerCopy(target, source, containerOnly, false, nil) - } - - err = s.copyContainer(target, source) - if err != nil { - return err - } - - if containerOnly { - logger.Debugf("Copied BTRFS container storage %s to %s", source.Name(), target.Name()) - return nil - } - - snapshots, err := source.Snapshots() - if err != nil { - return err - } - - if len(snapshots) == 0 { - logger.Debugf("Copied BTRFS container storage %s to %s", source.Name(), target.Name()) - return nil - } - - for _, snap := range snapshots { - sourceSnapshot, err := containerLoadByProjectAndName(s.s, source.Project(), snap.Name()) - if err != nil { - return err - } - - _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap.Name()) - newSnapName := fmt.Sprintf("%s/%s", target.Name(), snapOnlyName) - targetSnapshot, err := containerLoadByProjectAndName(s.s, target.Project(), newSnapName) - if err != nil { - return err - } - - err = s.copySnapshot(targetSnapshot, sourceSnapshot) - if err != nil { - return err - } - } - - logger.Debugf("Copied BTRFS container storage %s to %s", source.Name(), target.Name()) - return nil -} - -func (s *storageBtrfs) ContainerRefresh(target container, source container, snapshots []container) error { - logger.Debugf("Refreshing BTRFS container storage for %s from %s", target.Name(), source.Name()) - - // The storage pool needs to be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - ourStart, err := source.StorageStart() - if err != nil { - return err - } - if ourStart { - defer source.StorageStop() - } - - return s.doCrossPoolContainerCopy(target, source, len(snapshots) == 0, true, snapshots) -} - -func (s *storageBtrfs) ContainerMount(c container) (bool, error) { - logger.Debugf("Mounting BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - // The storage pool must be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return false, err - } - - logger.Debugf("Mounted BTRFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return true, nil -} - -func (s *storageBtrfs) ContainerUmount(c container, path string) (bool, error) { - return true, nil -} - -func (s *storageBtrfs) ContainerRename(container container, newName string) error { - logger.Debugf("Renaming BTRFS storage volume for container \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - - // The storage pool must be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - oldContainerSubvolumeName := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - newContainerSubvolumeName := getContainerMountPoint(container.Project(), s.pool.Name, newName) - err = os.Rename(oldContainerSubvolumeName, newContainerSubvolumeName) - if err != nil { - return err - } - - newSymlink := shared.VarPath("containers", projectPrefix(container.Project(), newName)) - err = renameContainerMountpoint(oldContainerSubvolumeName, container.Path(), newContainerSubvolumeName, newSymlink) - if err != nil { - return err - } - - oldSnapshotSubvolumeName := getSnapshotMountPoint(container.Project(), s.pool.Name, container.Name()) - newSnapshotSubvolumeName := getSnapshotMountPoint(container.Project(), s.pool.Name, newName) - if shared.PathExists(oldSnapshotSubvolumeName) { - err = os.Rename(oldSnapshotSubvolumeName, newSnapshotSubvolumeName) - if err != nil { - return err - } - } - - oldSnapshotSymlink := shared.VarPath("snapshots", projectPrefix(container.Project(), container.Name())) - newSnapshotSymlink := shared.VarPath("snapshots", projectPrefix(container.Project(), newName)) - if shared.PathExists(oldSnapshotSymlink) { - err := os.Remove(oldSnapshotSymlink) - if err != nil { - return err - } - - err = os.Symlink(newSnapshotSubvolumeName, newSnapshotSymlink) - if err != nil { - return err - } - } - - logger.Debugf("Renamed BTRFS storage volume for container \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - return nil -} - -func (s *storageBtrfs) ContainerRestore(container container, sourceContainer container) error { - logger.Debugf("Restoring BTRFS storage volume for container \"%s\" from %s to %s", s.volume.Name, sourceContainer.Name(), container.Name()) - - // The storage pool must be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Create a backup so we can revert. - targetContainerSubvolumeName := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - backupTargetContainerSubvolumeName := fmt.Sprintf("%s.tmp", targetContainerSubvolumeName) - err = os.Rename(targetContainerSubvolumeName, backupTargetContainerSubvolumeName) - if err != nil { - return err - } - undo := true - defer func() { - if undo { - os.Rename(backupTargetContainerSubvolumeName, targetContainerSubvolumeName) - } - }() - - ourStart, err := sourceContainer.StorageStart() - if err != nil { - return err - } - if ourStart { - defer sourceContainer.StorageStop() - } - - // Mount the source container. - srcContainerStorage := sourceContainer.Storage() - _, sourcePool, _ := srcContainerStorage.GetContainerPoolInfo() - sourceContainerSubvolumeName := "" - if sourceContainer.IsSnapshot() { - sourceContainerSubvolumeName = getSnapshotMountPoint(sourceContainer.Project(), sourcePool, sourceContainer.Name()) - } else { - sourceContainerSubvolumeName = getContainerMountPoint(container.Project(), sourcePool, sourceContainer.Name()) - } - - var failure error - _, targetPool, _ := s.GetContainerPoolInfo() - if targetPool == sourcePool { - // They are on the same storage pool, so we can simply snapshot. - err := s.btrfsPoolVolumesSnapshot(sourceContainerSubvolumeName, targetContainerSubvolumeName, false, true) - if err != nil { - failure = err - } - } else { - err := btrfsSubVolumeCreate(targetContainerSubvolumeName) - if err == nil { - // Use rsync to fill the empty volume. Sync by using - // the subvolume name. - bwlimit := s.pool.Config["rsync.bwlimit"] - output, err := rsyncLocalCopy(sourceContainerSubvolumeName, targetContainerSubvolumeName, bwlimit) - if err != nil { - s.ContainerDelete(container) - logger.Errorf("ContainerRestore: rsync failed: %s", string(output)) - failure = err - } - } else { - failure = err - } - } - - if failure == nil { - undo = false - _, sourcePool, _ := srcContainerStorage.GetContainerPoolInfo() - _, targetPool, _ := s.GetContainerPoolInfo() - if targetPool == sourcePool { - // Remove the backup, we made - return btrfsSubVolumesDelete(backupTargetContainerSubvolumeName) - } - - err = os.RemoveAll(backupTargetContainerSubvolumeName) - if err != nil && !os.IsNotExist(err) { - return err - } - } - - logger.Debugf("Restored BTRFS storage volume for container \"%s\" from %s to %s", s.volume.Name, sourceContainer.Name(), container.Name()) - return failure -} - -func (s *storageBtrfs) ContainerGetUsage(container container) (int64, error) { - return s.btrfsPoolVolumeQGroupUsage(container.Path()) -} - -func (s *storageBtrfs) doContainerSnapshotCreate(project string, targetName string, sourceName string) error { - logger.Debugf("Creating BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // We can only create the btrfs subvolume under the mounted storage - // pool. The on-disk layout for snapshots on a btrfs storage pool will - // thus be - // ${LXD_DIR}/storage-pools//snapshots/. The btrfs tool will - // complain if the intermediate path does not exist, so create it if it - // doesn't already. - snapshotSubvolumePath := getSnapshotSubvolumePath(project, s.pool.Name, sourceName) - if !shared.PathExists(snapshotSubvolumePath) { - err := os.MkdirAll(snapshotSubvolumePath, containersDirMode) - if err != nil { - return err - } - } - - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(project, s.volume.Name)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(project, sourceName)) - if !shared.PathExists(snapshotMntPointSymlink) { - if !shared.PathExists(snapshotMntPointSymlinkTarget) { - err = os.MkdirAll(snapshotMntPointSymlinkTarget, snapshotsDirMode) - if err != nil { - return err - } - } - - err := os.Symlink(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - } - - srcContainerSubvolumeName := getContainerMountPoint(project, s.pool.Name, sourceName) - snapshotSubvolumeName := getSnapshotMountPoint(project, s.pool.Name, targetName) - err = s.btrfsPoolVolumesSnapshot(srcContainerSubvolumeName, snapshotSubvolumeName, true, true) - if err != nil { - return err - } - - logger.Debugf("Created BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) ContainerSnapshotCreate(snapshotContainer container, sourceContainer container) error { - err := s.doContainerSnapshotCreate(sourceContainer.Project(), snapshotContainer.Name(), sourceContainer.Name()) - if err != nil { - s.ContainerSnapshotDelete(snapshotContainer) - return err - } - - return nil -} - -func btrfsSnapshotDeleteInternal(project, poolName string, snapshotName string) error { - snapshotSubvolumeName := getSnapshotMountPoint(project, poolName, snapshotName) - if shared.PathExists(snapshotSubvolumeName) && isBtrfsSubVolume(snapshotSubvolumeName) { - err := btrfsSubVolumesDelete(snapshotSubvolumeName) - if err != nil { - return err - } - } - - sourceSnapshotMntPoint := shared.VarPath("snapshots", projectPrefix(project, snapshotName)) - os.Remove(sourceSnapshotMntPoint) - os.Remove(snapshotSubvolumeName) - - sourceName, _, _ := containerGetParentAndSnapshotName(snapshotName) - snapshotSubvolumePath := getSnapshotSubvolumePath(project, poolName, sourceName) - os.Remove(snapshotSubvolumePath) - if !shared.PathExists(snapshotSubvolumePath) { - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(project, sourceName)) - os.Remove(snapshotMntPointSymlink) - } - - return nil -} - -func (s *storageBtrfs) ContainerSnapshotDelete(snapshotContainer container) error { - logger.Debugf("Deleting BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - err = btrfsSnapshotDeleteInternal(snapshotContainer.Project(), s.pool.Name, snapshotContainer.Name()) - if err != nil { - return err - } - - logger.Debugf("Deleted BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) ContainerSnapshotStart(container container) (bool, error) { - logger.Debugf("Initializing BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return false, err - } - - snapshotSubvolumeName := getSnapshotMountPoint(container.Project(), s.pool.Name, container.Name()) - roSnapshotSubvolumeName := fmt.Sprintf("%s.ro", snapshotSubvolumeName) - if shared.PathExists(roSnapshotSubvolumeName) { - logger.Debugf("The BTRFS snapshot is already mounted read-write") - return false, nil - } - - err = os.Rename(snapshotSubvolumeName, roSnapshotSubvolumeName) - if err != nil { - return false, err - } - - err = s.btrfsPoolVolumesSnapshot(roSnapshotSubvolumeName, snapshotSubvolumeName, false, true) - if err != nil { - return false, err - } - - logger.Debugf("Initialized BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return true, nil -} - -func (s *storageBtrfs) ContainerSnapshotStop(container container) (bool, error) { - logger.Debugf("Stopping BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return false, err - } - - snapshotSubvolumeName := getSnapshotMountPoint(container.Project(), s.pool.Name, container.Name()) - roSnapshotSubvolumeName := fmt.Sprintf("%s.ro", snapshotSubvolumeName) - if !shared.PathExists(roSnapshotSubvolumeName) { - logger.Debugf("The BTRFS snapshot is currently not mounted read-write") - return false, nil - } - - if shared.PathExists(snapshotSubvolumeName) && isBtrfsSubVolume(snapshotSubvolumeName) { - err = btrfsSubVolumesDelete(snapshotSubvolumeName) - if err != nil { - return false, err - } - } - - err = os.Rename(roSnapshotSubvolumeName, snapshotSubvolumeName) - if err != nil { - return false, err - } - - logger.Debugf("Stopped BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return true, nil -} - -// ContainerSnapshotRename renames a snapshot of a container. -func (s *storageBtrfs) ContainerSnapshotRename(snapshotContainer container, newName string) error { - logger.Debugf("Renaming BTRFS storage volume for snapshot \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - - // The storage pool must be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Unmount the snapshot if it is mounted otherwise we'll get EBUSY. - // Rename the subvolume on the storage pool. - oldSnapshotSubvolumeName := getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, snapshotContainer.Name()) - newSnapshotSubvolumeName := getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, newName) - err = os.Rename(oldSnapshotSubvolumeName, newSnapshotSubvolumeName) - if err != nil { - return err - } - - logger.Debugf("Renamed BTRFS storage volume for snapshot \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - return nil -} - -// Needed for live migration where an empty snapshot needs to be created before -// rsyncing into it. -func (s *storageBtrfs) ContainerSnapshotCreateEmpty(snapshotContainer container) error { - logger.Debugf("Creating empty BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - // Mount the storage pool. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Create the snapshot subvole path on the storage pool. - sourceName, _, _ := containerGetParentAndSnapshotName(snapshotContainer.Name()) - snapshotSubvolumePath := getSnapshotSubvolumePath(snapshotContainer.Project(), s.pool.Name, sourceName) - snapshotSubvolumeName := getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, snapshotContainer.Name()) - if !shared.PathExists(snapshotSubvolumePath) { - err := os.MkdirAll(snapshotSubvolumePath, containersDirMode) - if err != nil { - return err - } - } - - err = btrfsSubVolumeCreate(snapshotSubvolumeName) - if err != nil { - return err - } - - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(snapshotContainer.Project(), sourceName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(snapshotContainer.Project(), sourceName)) - if !shared.PathExists(snapshotMntPointSymlink) { - err := createContainerMountpoint(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink, snapshotContainer.IsPrivileged()) - if err != nil { - return err - } - } - - logger.Debugf("Created empty BTRFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) doBtrfsBackup(cur string, prev string, target string) error { - args := []string{"send"} - if prev != "" { - args = append(args, "-p", prev) - } - args = append(args, cur) - - eater, err := os.OpenFile(target, os.O_RDWR|os.O_CREATE, 0644) - if err != nil { - return err - } - defer eater.Close() - - btrfsSendCmd := exec.Command("btrfs", args...) - btrfsSendCmd.Stdout = eater - - err = btrfsSendCmd.Run() - if err != nil { - return err - } - - return err -} - -func (s *storageBtrfs) doContainerBackupCreateOptimized(tmpPath string, backup backup, source container) error { - // Handle snapshots - finalParent := "" - if !backup.containerOnly { - snapshotsPath := fmt.Sprintf("%s/snapshots", tmpPath) - - // Retrieve the snapshots - snapshots, err := source.Snapshots() - if err != nil { - return err - } - - // Create the snapshot path - if len(snapshots) > 0 { - err = os.MkdirAll(snapshotsPath, 0711) - if err != nil { - return err - } - } - - for i, snap := range snapshots { - _, snapName, _ := containerGetParentAndSnapshotName(snap.Name()) - - // Figure out previous and current subvolumes - prev := "" - if i > 0 { - // /var/lib/lxd/storage-pools//containers-snapshots// - prev = getSnapshotMountPoint(source.Project(), s.pool.Name, snapshots[i-1].Name()) - } - cur := getSnapshotMountPoint(source.Project(), s.pool.Name, snap.Name()) - - // Make a binary btrfs backup - target := fmt.Sprintf("%s/%s.bin", snapshotsPath, snapName) - err := s.doBtrfsBackup(cur, prev, target) - if err != nil { - return err - } - - finalParent = cur - } - } - - // Make a temporary copy of the container - sourceVolume := getContainerMountPoint(source.Project(), s.pool.Name, source.Name()) - containersPath := getContainerMountPoint("default", s.pool.Name, "") - tmpContainerMntPoint, err := ioutil.TempDir(containersPath, source.Name()) - if err != nil { - return err - } - defer os.RemoveAll(tmpContainerMntPoint) - - err = os.Chmod(tmpContainerMntPoint, 0700) - if err != nil { - return err - } - - targetVolume := fmt.Sprintf("%s/.backup", tmpContainerMntPoint) - err = s.btrfsPoolVolumesSnapshot(sourceVolume, targetVolume, true, true) - if err != nil { - return err - } - defer btrfsSubVolumesDelete(targetVolume) - - // Dump the container to a file - fsDump := fmt.Sprintf("%s/container.bin", tmpPath) - err = s.doBtrfsBackup(targetVolume, finalParent, fsDump) - if err != nil { - return err - } - - return nil -} - -func (s *storageBtrfs) doContainerBackupCreateVanilla(tmpPath string, backup backup, source container) error { - // Prepare for rsync - rsync := func(oldPath string, newPath string, bwlimit string) error { - output, err := rsyncLocalCopy(oldPath, newPath, bwlimit) - if err != nil { - return fmt.Errorf("Failed to rsync: %s: %s", string(output), err) - } - - return nil - } - - bwlimit := s.pool.Config["rsync.bwlimit"] - - // Handle snapshots - if !backup.containerOnly { - snapshotsPath := fmt.Sprintf("%s/snapshots", tmpPath) - - // Retrieve the snapshots - snapshots, err := source.Snapshots() - if err != nil { - return err - } - - // Create the snapshot path - if len(snapshots) > 0 { - err = os.MkdirAll(snapshotsPath, 0711) - if err != nil { - return err - } - } - - for _, snap := range snapshots { - _, snapName, _ := containerGetParentAndSnapshotName(snap.Name()) - - // Mount the snapshot to a usable path - _, err := s.ContainerSnapshotStart(snap) - if err != nil { - return err - } - - snapshotMntPoint := getSnapshotMountPoint(snap.Project(), s.pool.Name, snap.Name()) - target := fmt.Sprintf("%s/%s", snapshotsPath, snapName) - - // Copy the snapshot - err = rsync(snapshotMntPoint, target, bwlimit) - s.ContainerSnapshotStop(snap) - if err != nil { - return err - } - } - } - - // Make a temporary copy of the container - sourceVolume := getContainerMountPoint(source.Project(), s.pool.Name, source.Name()) - containersPath := getContainerMountPoint("default", s.pool.Name, "") - tmpContainerMntPoint, err := ioutil.TempDir(containersPath, source.Name()) - if err != nil { - return err - } - defer os.RemoveAll(tmpContainerMntPoint) - - err = os.Chmod(tmpContainerMntPoint, 0700) - if err != nil { - return err - } - - targetVolume := fmt.Sprintf("%s/.backup", tmpContainerMntPoint) - err = s.btrfsPoolVolumesSnapshot(sourceVolume, targetVolume, true, true) - if err != nil { - return err - } - defer btrfsSubVolumesDelete(targetVolume) - - // Copy the container - containerPath := fmt.Sprintf("%s/container", tmpPath) - err = rsync(targetVolume, containerPath, bwlimit) - if err != nil { - return err - } - - return nil -} - -func (s *storageBtrfs) ContainerBackupCreate(backup backup, source container) error { - // Start storage - ourStart, err := source.StorageStart() - if err != nil { - return err - } - if ourStart { - defer source.StorageStop() - } - - // Create a temporary path for the backup - tmpPath, err := ioutil.TempDir(shared.VarPath("backups"), "lxd_backup_") - if err != nil { - return err - } - defer os.RemoveAll(tmpPath) - - // Generate the actual backup - if backup.optimizedStorage { - err = s.doContainerBackupCreateOptimized(tmpPath, backup, source) - if err != nil { - return err - } - } else { - err := s.doContainerBackupCreateVanilla(tmpPath, backup, source) - if err != nil { - return err - } - } - - // Pack the backup - err = backupCreateTarball(s.s, tmpPath, backup) - if err != nil { - return err - } - - return nil -} - -func (s *storageBtrfs) doContainerBackupLoadOptimized(info backupInfo, data io.ReadSeeker, tarArgs []string) error { - containerName, _, _ := containerGetParentAndSnapshotName(info.Name) - - containerMntPoint := getContainerMountPoint("default", s.pool.Name, "") - unpackDir, err := ioutil.TempDir(containerMntPoint, containerName) - if err != nil { - return err - } - defer os.RemoveAll(unpackDir) - - err = os.Chmod(unpackDir, 0700) - if err != nil { - return err - } - - unpackPath := fmt.Sprintf("%s/.backup_unpack", unpackDir) - err = os.MkdirAll(unpackPath, 0711) - if err != nil { - return err - } - - // Prepare tar arguments - args := append(tarArgs, []string{ - "-", - "--strip-components=1", - "-C", unpackPath, "backup", - }...) - - // Extract container - data.Seek(0, 0) - err = shared.RunCommandWithFds(data, nil, "tar", args...) - if err != nil { - logger.Errorf("Failed to untar \"%s\" into \"%s\": %s", "backup", unpackPath, err) - return err - } - - for _, snapshotOnlyName := range info.Snapshots { - snapshotBackup := fmt.Sprintf("%s/snapshots/%s.bin", unpackPath, snapshotOnlyName) - feeder, err := os.Open(snapshotBackup) - if err != nil { - return err - } - - // create mountpoint - snapshotMntPoint := getSnapshotMountPoint(info.Project, s.pool.Name, containerName) - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(info.Project, containerName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(info.Project, containerName)) - err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - feeder.Close() - return err - } - - // /var/lib/lxd/storage-pools//snapshots// - btrfsRecvCmd := exec.Command("btrfs", "receive", "-e", snapshotMntPoint) - btrfsRecvCmd.Stdin = feeder - msg, err := btrfsRecvCmd.CombinedOutput() - feeder.Close() - if err != nil { - logger.Errorf("Failed to receive contents of btrfs backup \"%s\": %s", snapshotBackup, string(msg)) - return err - } - } - - containerBackupFile := fmt.Sprintf("%s/container.bin", unpackPath) - feeder, err := os.Open(containerBackupFile) - if err != nil { - return err - } - defer feeder.Close() - - // /var/lib/lxd/storage-pools//containers/ - btrfsRecvCmd := exec.Command("btrfs", "receive", "-vv", "-e", unpackDir) - btrfsRecvCmd.Stdin = feeder - msg, err := btrfsRecvCmd.CombinedOutput() - if err != nil { - logger.Errorf("Failed to receive contents of btrfs backup \"%s\": %s", containerBackupFile, string(msg)) - return err - } - tmpContainerMntPoint := fmt.Sprintf("%s/.backup", unpackDir) - defer btrfsSubVolumesDelete(tmpContainerMntPoint) - - containerMntPoint = getContainerMountPoint(info.Project, s.pool.Name, info.Name) - err = s.btrfsPoolVolumesSnapshot(tmpContainerMntPoint, containerMntPoint, false, true) - if err != nil { - logger.Errorf("Failed to create btrfs snapshot \"%s\" of \"%s\": %s", tmpContainerMntPoint, containerMntPoint, err) - return err - } - - // Create mountpoints - err = createContainerMountpoint(containerMntPoint, shared.VarPath("containers", projectPrefix(info.Project, info.Name)), info.Privileged) - if err != nil { - return err - } - - return nil -} - -func (s *storageBtrfs) doContainerBackupLoadVanilla(info backupInfo, data io.ReadSeeker, tarArgs []string) error { - // create the main container - err := s.doContainerCreate(info.Project, info.Name, info.Privileged) - if err != nil { - return err - } - - containerMntPoint := getContainerMountPoint(info.Project, s.pool.Name, info.Name) - // Extract container - for _, snap := range info.Snapshots { - cur := fmt.Sprintf("backup/snapshots/%s", snap) - - // Prepare tar arguments - args := append(tarArgs, []string{ - "-", - "--recursive-unlink", - "--xattrs-include=*", - "--strip-components=3", - "-C", containerMntPoint, cur, - }...) - - // Extract snapshots - data.Seek(0, 0) - err = shared.RunCommandWithFds(data, nil, "tar", args...) - if err != nil { - logger.Errorf("Failed to untar \"%s\" into \"%s\": %s", cur, containerMntPoint, err) - return err - } - - // create snapshot - err = s.doContainerSnapshotCreate(info.Project, fmt.Sprintf("%s/%s", info.Name, snap), info.Name) - if err != nil { - return err - } - } - - // Prepare tar arguments - args := append(tarArgs, []string{ - "-", - "--strip-components=2", - "--xattrs-include=*", - "-C", containerMntPoint, "backup/container", - }...) - - // Extract container - data.Seek(0, 0) - err = shared.RunCommandWithFds(data, nil, "tar", args...) - if err != nil { - logger.Errorf("Failed to untar \"backup/container\" into \"%s\": %s", containerMntPoint, err) - return err - } - - return nil -} - -func (s *storageBtrfs) ContainerBackupLoad(info backupInfo, data io.ReadSeeker, tarArgs []string) error { - logger.Debugf("Loading BTRFS storage volume for backup \"%s\" on storage pool \"%s\"", info.Name, s.pool.Name) - - if info.HasBinaryFormat { - return s.doContainerBackupLoadOptimized(info, data, tarArgs) - } - - return s.doContainerBackupLoadVanilla(info, data, tarArgs) -} - -func (s *storageBtrfs) ImageCreate(fingerprint string, tracker *ioprogress.ProgressTracker) error { - logger.Debugf("Creating BTRFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - - // Create the subvolume. - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - err = s.createImageDbPoolVolume(fingerprint) - if err != nil { - return err - } - - // We can only create the btrfs subvolume under the mounted storage - // pool. The on-disk layout for images on a btrfs storage pool will thus - // be - // ${LXD_DIR}/storage-pools//images/. The btrfs tool will - // complain if the intermediate path does not exist, so create it if it - // doesn't already. - imageSubvolumePath := s.getImageSubvolumePath(s.pool.Name) - if !shared.PathExists(imageSubvolumePath) { - err := os.MkdirAll(imageSubvolumePath, imagesDirMode) - if err != nil { - return err - } - } - - // Create a temporary rw btrfs subvolume. From this rw subvolume we'll - // create a ro snapshot below. The path with which we do this is - // ${LXD_DIR}/storage-pools//images/@_tmp. - imageSubvolumeName := getImageMountPoint(s.pool.Name, fingerprint) - tmpImageSubvolumeName := fmt.Sprintf("%s_tmp", imageSubvolumeName) - err = btrfsSubVolumeCreate(tmpImageSubvolumeName) - if err != nil { - return err - } - // Delete volume on error. - undo := true - defer func() { - if undo { - btrfsSubVolumesDelete(tmpImageSubvolumeName) - } - }() - - // Unpack the image in imageMntPoint. - imagePath := shared.VarPath("images", fingerprint) - err = unpackImage(imagePath, tmpImageSubvolumeName, storageTypeBtrfs, s.s.OS.RunningInUserNS, tracker) - if err != nil { - return err - } - - // Now create a read-only snapshot of the subvolume. - // The path with which we do this is - // ${LXD_DIR}/storage-pools//images/. - err = s.btrfsPoolVolumesSnapshot(tmpImageSubvolumeName, imageSubvolumeName, true, true) - if err != nil { - return err - } - - defer func() { - if undo { - btrfsSubVolumesDelete(imageSubvolumeName) - } - }() - - err = btrfsSubVolumesDelete(tmpImageSubvolumeName) - if err != nil { - return err - } - - undo = false - - logger.Debugf("Created BTRFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - return nil -} - -func (s *storageBtrfs) ImageDelete(fingerprint string) error { - logger.Debugf("Deleting BTRFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Delete the btrfs subvolume. The path with which we - // do this is ${LXD_DIR}/storage-pools//images/. - imageSubvolumeName := getImageMountPoint(s.pool.Name, fingerprint) - if shared.PathExists(imageSubvolumeName) && isBtrfsSubVolume(imageSubvolumeName) { - err = btrfsSubVolumesDelete(imageSubvolumeName) - if err != nil { - return err - } - } - - err = s.deleteImageDbPoolVolume(fingerprint) - if err != nil { - return err - } - - // Now delete the mountpoint for the image: - // ${LXD_DIR}/images/. - if shared.PathExists(imageSubvolumeName) { - err := os.RemoveAll(imageSubvolumeName) - if err != nil && !os.IsNotExist(err) { - return err - } - } - - logger.Debugf("Deleted BTRFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - return nil -} - -func (s *storageBtrfs) ImageMount(fingerprint string) (bool, error) { - logger.Debugf("Mounting BTRFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - - // The storage pool must be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return false, err - } - - logger.Debugf("Mounted BTRFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - return true, nil -} - -func (s *storageBtrfs) ImageUmount(fingerprint string) (bool, error) { - return true, nil -} - -func btrfsSubVolumeCreate(subvol string) error { - parentDestPath := filepath.Dir(subvol) - if !shared.PathExists(parentDestPath) { - err := os.MkdirAll(parentDestPath, 0711) - if err != nil { - return err - } - } - - output, err := shared.RunCommand( - "btrfs", - "subvolume", - "create", - subvol) - if err != nil { - logger.Errorf("Failed to create BTRFS subvolume \"%s\": %s", subvol, output) - return err - } - - return nil -} - -func btrfsSubVolumeQGroup(subvol string) (string, error) { - output, err := shared.RunCommand( - "btrfs", - "qgroup", - "show", - subvol, - "-e", - "-f") - - if err != nil { - return "", db.ErrNoSuchObject - } - - var qgroup string - for _, line := range strings.Split(output, "\n") { - if line == "" || strings.HasPrefix(line, "qgroupid") || strings.HasPrefix(line, "---") { - continue - } - - fields := strings.Fields(line) - if len(fields) != 4 { - continue - } - - qgroup = fields[0] - } - - if qgroup == "" { - return "", fmt.Errorf("Unable to find quota group") - } - - return qgroup, nil -} - -func (s *storageBtrfs) btrfsPoolVolumeQGroupUsage(subvol string) (int64, error) { - output, err := shared.RunCommand( - "btrfs", - "qgroup", - "show", - subvol, - "-e", - "-f") - - if err != nil { - return -1, fmt.Errorf("BTRFS quotas not supported. Try enabling them with \"btrfs quota enable\"") - } - - for _, line := range strings.Split(output, "\n") { - if line == "" || strings.HasPrefix(line, "qgroupid") || strings.HasPrefix(line, "---") { - continue - } - - fields := strings.Fields(line) - if len(fields) != 4 { - continue - } - - usage, err := strconv.ParseInt(fields[2], 10, 64) - if err != nil { - continue - } - - return usage, nil - } - - return -1, fmt.Errorf("Unable to find current qgroup usage") -} - -func btrfsSubVolumeDelete(subvol string) error { - // Attempt (but don't fail on) to delete any qgroup on the subvolume - qgroup, err := btrfsSubVolumeQGroup(subvol) - if err == nil { - shared.RunCommand( - "btrfs", - "qgroup", - "destroy", - qgroup, - subvol) - } - - // Attempt to make the subvolume writable - shared.RunCommand("btrfs", "property", "set", subvol, "ro", "false") - - // Delete the subvolume itself - _, err = shared.RunCommand( - "btrfs", - "subvolume", - "delete", - subvol) - - return err -} - -// btrfsPoolVolumesDelete is the recursive variant on btrfsPoolVolumeDelete, -// it first deletes subvolumes of the subvolume and then the -// subvolume itself. -func btrfsSubVolumesDelete(subvol string) error { - // Delete subsubvols. - subsubvols, err := btrfsSubVolumesGet(subvol) - if err != nil { - return err - } - sort.Sort(sort.Reverse(sort.StringSlice(subsubvols))) - - for _, subsubvol := range subsubvols { - err := btrfsSubVolumeDelete(path.Join(subvol, subsubvol)) - if err != nil { - return err - } - } - - // Delete the subvol itself - err = btrfsSubVolumeDelete(subvol) - if err != nil { - return err - } - - return nil -} - -/* - * btrfsSnapshot creates a snapshot of "source" to "dest" - * the result will be readonly if "readonly" is True. - */ -func btrfsSnapshot(source string, dest string, readonly bool) error { - var output string - var err error - if readonly { - output, err = shared.RunCommand( - "btrfs", - "subvolume", - "snapshot", - "-r", - source, - dest) - } else { - output, err = shared.RunCommand( - "btrfs", - "subvolume", - "snapshot", - source, - dest) - } - if err != nil { - return fmt.Errorf( - "subvolume snapshot failed, source=%s, dest=%s, output=%s", - source, - dest, - output, - ) - } - - return err -} - -func (s *storageBtrfs) btrfsPoolVolumeSnapshot(source string, dest string, readonly bool) error { - return btrfsSnapshot(source, dest, readonly) -} - -func (s *storageBtrfs) btrfsPoolVolumesSnapshot(source string, dest string, readonly bool, recursive bool) error { - // Now snapshot all subvolumes of the root. - if recursive { - // Get a list of subvolumes of the root - subsubvols, err := btrfsSubVolumesGet(source) - if err != nil { - return err - } - sort.Sort(sort.StringSlice(subsubvols)) - - if len(subsubvols) > 0 && readonly { - // A root with subvolumes can never be readonly, - // also don't make subvolumes readonly. - readonly = false - - logger.Warnf("Subvolumes detected, ignoring ro flag") - } - - // First snapshot the root - err = s.btrfsPoolVolumeSnapshot(source, dest, readonly) - if err != nil { - return err - } - - for _, subsubvol := range subsubvols { - // Clear the target for the subvol to use - os.Remove(path.Join(dest, subsubvol)) - - err := s.btrfsPoolVolumeSnapshot(path.Join(source, subsubvol), path.Join(dest, subsubvol), readonly) - if err != nil { - return err - } - } - } else { - err := s.btrfsPoolVolumeSnapshot(source, dest, readonly) - if err != nil { - return err - } - } - - return nil -} - -// isBtrfsSubVolume returns true if the given Path is a btrfs subvolume else -// false. -func isBtrfsSubVolume(subvolPath string) bool { - fs := syscall.Stat_t{} - err := syscall.Lstat(subvolPath, &fs) - if err != nil { - return false - } - - // Check if BTRFS_FIRST_FREE_OBJECTID - if fs.Ino != 256 { - return false - } - - return true -} - -func isBtrfsFilesystem(path string) bool { - _, err := shared.RunCommand("btrfs", "filesystem", "show", path) - if err != nil { - return false - } - - return true -} - -func isOnBtrfs(path string) bool { - fs := syscall.Statfs_t{} - - err := syscall.Statfs(path, &fs) - if err != nil { - return false - } - - if fs.Type != util.FilesystemSuperMagicBtrfs { - return false - } - - return true -} - -func btrfsSubVolumesGet(path string) ([]string, error) { - result := []string{} - - if !strings.HasSuffix(path, "/") { - path = path + "/" - } - - // Unprivileged users can't get to fs internals - filepath.Walk(path, func(fpath string, fi os.FileInfo, err error) error { - // Skip walk errors - if err != nil { - return nil - } - - // Ignore the base path - if strings.TrimRight(fpath, "/") == strings.TrimRight(path, "/") { - return nil - } - - // Subvolumes can only be directories - if !fi.IsDir() { - return nil - } - - // Check if a btrfs subvolume - if isBtrfsSubVolume(fpath) { - result = append(result, strings.TrimPrefix(fpath, path)) - } - - return nil - }) - - return result, nil -} - -type btrfsMigrationSourceDriver struct { - container container - snapshots []container - btrfsSnapshotNames []string - btrfs *storageBtrfs - runningSnapName string - stoppedSnapName string -} - -func (s *btrfsMigrationSourceDriver) send(conn *websocket.Conn, btrfsPath string, btrfsParent string, readWrapper func(io.ReadCloser) io.ReadCloser) error { - args := []string{"send"} - if btrfsParent != "" { - args = append(args, "-p", btrfsParent) - } - args = append(args, btrfsPath) - - cmd := exec.Command("btrfs", args...) - - stdout, err := cmd.StdoutPipe() - if err != nil { - return err - } - - readPipe := io.ReadCloser(stdout) - if readWrapper != nil { - readPipe = readWrapper(stdout) - } - - stderr, err := cmd.StderrPipe() - if err != nil { - return err - } - - err = cmd.Start() - if err != nil { - return err - } - - <-shared.WebsocketSendStream(conn, readPipe, 4*1024*1024) - - output, err := ioutil.ReadAll(stderr) - if err != nil { - logger.Errorf("Problem reading btrfs send stderr: %s", err) - } - - err = cmd.Wait() - if err != nil { - logger.Errorf("Problem with btrfs send: %s", string(output)) - } - - return err -} - -func (s *btrfsMigrationSourceDriver) SendWhileRunning(conn *websocket.Conn, op *operation, bwlimit string, containerOnly bool) error { - _, containerPool, _ := s.container.Storage().GetContainerPoolInfo() - containerName := s.container.Name() - containersPath := getContainerMountPoint("default", containerPool, "") - sourceName := containerName - - // Deal with sending a snapshot to create a container on another LXD - // instance. - if s.container.IsSnapshot() { - sourceName, _, _ := containerGetParentAndSnapshotName(containerName) - snapshotsPath := getSnapshotMountPoint(s.container.Project(), containerPool, sourceName) - tmpContainerMntPoint, err := ioutil.TempDir(snapshotsPath, sourceName) - if err != nil { - return err - } - defer os.RemoveAll(tmpContainerMntPoint) - - err = os.Chmod(tmpContainerMntPoint, 0700) - if err != nil { - return err - } - - migrationSendSnapshot := fmt.Sprintf("%s/.migration-send", tmpContainerMntPoint) - snapshotMntPoint := getSnapshotMountPoint(s.container.Project(), containerPool, containerName) - err = s.btrfs.btrfsPoolVolumesSnapshot(snapshotMntPoint, migrationSendSnapshot, true, true) - if err != nil { - return err - } - defer btrfsSubVolumesDelete(migrationSendSnapshot) - - wrapper := StorageProgressReader(op, "fs_progress", containerName) - return s.send(conn, migrationSendSnapshot, "", wrapper) - } - - if !containerOnly { - for i, snap := range s.snapshots { - prev := "" - if i > 0 { - prev = getSnapshotMountPoint(snap.Project(), containerPool, s.snapshots[i-1].Name()) - } - - snapMntPoint := getSnapshotMountPoint(snap.Project(), containerPool, snap.Name()) - wrapper := StorageProgressReader(op, "fs_progress", snap.Name()) - if err := s.send(conn, snapMntPoint, prev, wrapper); err != nil { - return err - } - } - } - - tmpContainerMntPoint, err := ioutil.TempDir(containersPath, containerName) - if err != nil { - return err - } - defer os.RemoveAll(tmpContainerMntPoint) - - err = os.Chmod(tmpContainerMntPoint, 0700) - if err != nil { - return err - } - - migrationSendSnapshot := fmt.Sprintf("%s/.migration-send", tmpContainerMntPoint) - containerMntPoint := getContainerMountPoint(s.container.Project(), containerPool, sourceName) - err = s.btrfs.btrfsPoolVolumesSnapshot(containerMntPoint, migrationSendSnapshot, true, true) - if err != nil { - return err - } - defer btrfsSubVolumesDelete(migrationSendSnapshot) - - btrfsParent := "" - if len(s.btrfsSnapshotNames) > 0 { - btrfsParent = s.btrfsSnapshotNames[len(s.btrfsSnapshotNames)-1] - } - - wrapper := StorageProgressReader(op, "fs_progress", containerName) - return s.send(conn, migrationSendSnapshot, btrfsParent, wrapper) -} - -func (s *btrfsMigrationSourceDriver) SendAfterCheckpoint(conn *websocket.Conn, bwlimit string) error { - tmpPath := getSnapshotMountPoint(s.container.Project(), s.btrfs.pool.Name, - fmt.Sprintf("%s/.migration-send", s.container.Name())) - err := os.MkdirAll(tmpPath, 0711) - if err != nil { - return err - } - - err = os.Chmod(tmpPath, 0700) - if err != nil { - return err - } - - s.stoppedSnapName = fmt.Sprintf("%s/.root", tmpPath) - parentName, _, _ := containerGetParentAndSnapshotName(s.container.Name()) - containerMntPt := getContainerMountPoint(s.container.Project(), s.btrfs.pool.Name, parentName) - err = s.btrfs.btrfsPoolVolumesSnapshot(containerMntPt, s.stoppedSnapName, true, true) - if err != nil { - return err - } - - return s.send(conn, s.stoppedSnapName, s.runningSnapName, nil) -} - -func (s *btrfsMigrationSourceDriver) Cleanup() { - if s.stoppedSnapName != "" { - btrfsSubVolumesDelete(s.stoppedSnapName) - } - - if s.runningSnapName != "" { - btrfsSubVolumesDelete(s.runningSnapName) - } -} - -func (s *storageBtrfs) MigrationType() migration.MigrationFSType { - if s.s.OS.RunningInUserNS { - return migration.MigrationFSType_RSYNC - } - - return migration.MigrationFSType_BTRFS -} - -func (s *storageBtrfs) PreservesInodes() bool { - if s.s.OS.RunningInUserNS { - return false - } - - return true -} - -func (s *storageBtrfs) MigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { - if s.s.OS.RunningInUserNS { - return rsyncMigrationSource(args) - } - - /* List all the snapshots in order of reverse creation. The idea here - * is that we send the oldest to newest snapshot, hopefully saving on - * xfer costs. Then, after all that, we send the container itself. - */ - var err error - var snapshots = []container{} - if !args.ContainerOnly { - snapshots, err = args.Container.Snapshots() - if err != nil { - return nil, err - } - } - - driver := &btrfsMigrationSourceDriver{ - container: args.Container, - snapshots: snapshots, - btrfsSnapshotNames: []string{}, - btrfs: s, - } - - if !args.ContainerOnly { - for _, snap := range snapshots { - btrfsPath := getSnapshotMountPoint(snap.Project(), s.pool.Name, snap.Name()) - driver.btrfsSnapshotNames = append(driver.btrfsSnapshotNames, btrfsPath) - } - } - - return driver, nil -} - -func (s *storageBtrfs) MigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { - if s.s.OS.RunningInUserNS { - return rsyncMigrationSink(conn, op, args) - } - - btrfsRecv := func(snapName string, btrfsPath string, targetPath string, isSnapshot bool, writeWrapper func(io.WriteCloser) io.WriteCloser) error { - args := []string{"receive", "-e", btrfsPath} - cmd := exec.Command("btrfs", args...) - - // Remove the existing pre-created subvolume - err := btrfsSubVolumesDelete(targetPath) - if err != nil { - logger.Errorf("Failed to delete pre-created BTRFS subvolume: %s: %v", btrfsPath, err) - return err - } - - stdin, err := cmd.StdinPipe() - if err != nil { - return err - } - - stderr, err := cmd.StderrPipe() - if err != nil { - return err - } - - err = cmd.Start() - if err != nil { - return err - } - - writePipe := io.WriteCloser(stdin) - if writeWrapper != nil { - writePipe = writeWrapper(stdin) - } - - <-shared.WebsocketRecvStream(writePipe, conn) - - output, err := ioutil.ReadAll(stderr) - if err != nil { - logger.Debugf("Problem reading btrfs receive stderr %s", err) - } - - err = cmd.Wait() - if err != nil { - logger.Errorf("Problem with btrfs receive: %s", string(output)) - return err - } - - receivedSnapshot := fmt.Sprintf("%s/.migration-send", btrfsPath) - // handle older lxd versions - if !shared.PathExists(receivedSnapshot) { - receivedSnapshot = fmt.Sprintf("%s/.root", btrfsPath) - } - if isSnapshot { - receivedSnapshot = fmt.Sprintf("%s/%s", btrfsPath, snapName) - err = s.btrfsPoolVolumesSnapshot(receivedSnapshot, targetPath, true, true) - } else { - err = s.btrfsPoolVolumesSnapshot(receivedSnapshot, targetPath, false, true) - } - if err != nil { - logger.Errorf("Problem with btrfs snapshot: %s", err) - return err - } - - err = btrfsSubVolumesDelete(receivedSnapshot) - if err != nil { - logger.Errorf("Failed to delete BTRFS subvolume \"%s\": %s", btrfsPath, err) - return err - } - - return nil - } - - containerName := args.Container.Name() - _, containerPool, _ := args.Container.Storage().GetContainerPoolInfo() - containersPath := getSnapshotMountPoint(args.Container.Project(), containerPool, containerName) - if !args.ContainerOnly && len(args.Snapshots) > 0 { - err := os.MkdirAll(containersPath, containersDirMode) - if err != nil { - return err - } - - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", containerPool, "containers-snapshots", projectPrefix(args.Container.Project(), containerName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(args.Container.Project(), containerName)) - if !shared.PathExists(snapshotMntPointSymlink) { - err := os.Symlink(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - } - } - - // At this point we have already figured out the parent - // container's root disk device so we can simply - // retrieve it from the expanded devices. - parentStoragePool := "" - parentExpandedDevices := args.Container.ExpandedDevices() - parentLocalRootDiskDeviceKey, parentLocalRootDiskDevice, _ := shared.GetRootDiskDevice(parentExpandedDevices) - if parentLocalRootDiskDeviceKey != "" { - parentStoragePool = parentLocalRootDiskDevice["pool"] - } - - // A little neuroticism. - if parentStoragePool == "" { - return fmt.Errorf("Detected that the container's root device is missing the pool property during BTRFS migration") - } - - if !args.ContainerOnly { - for _, snap := range args.Snapshots { - ctArgs := snapshotProtobufToContainerArgs(args.Container.Project(), containerName, snap) - - // Ensure that snapshot and parent container have the - // same storage pool in their local root disk device. - // If the root disk device for the snapshot comes from a - // profile on the new instance as well we don't need to - // do anything. - if ctArgs.Devices != nil { - snapLocalRootDiskDeviceKey, _, _ := shared.GetRootDiskDevice(ctArgs.Devices) - if snapLocalRootDiskDeviceKey != "" { - ctArgs.Devices[snapLocalRootDiskDeviceKey]["pool"] = parentStoragePool - } - } - - snapshotMntPoint := getSnapshotMountPoint(args.Container.Project(), containerPool, ctArgs.Name) - _, err := containerCreateEmptySnapshot(args.Container.DaemonState(), ctArgs) - if err != nil { - return err - } - - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(args.Container.Project(), containerName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(args.Container.Project(), containerName)) - err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - - tmpSnapshotMntPoint, err := ioutil.TempDir(containersPath, projectPrefix(args.Container.Project(), containerName)) - if err != nil { - return err - } - defer os.RemoveAll(tmpSnapshotMntPoint) - - err = os.Chmod(tmpSnapshotMntPoint, 0700) - if err != nil { - return err - } - - wrapper := StorageProgressWriter(op, "fs_progress", *snap.Name) - err = btrfsRecv(*(snap.Name), tmpSnapshotMntPoint, snapshotMntPoint, true, wrapper) - if err != nil { - return err - } - } - } - - containersMntPoint := getContainerMountPoint(args.Container.Project(), s.pool.Name, "") - err := createContainerMountpoint(containersMntPoint, args.Container.Path(), args.Container.IsPrivileged()) - if err != nil { - return err - } - - /* finally, do the real container */ - wrapper := StorageProgressWriter(op, "fs_progress", containerName) - tmpContainerMntPoint, err := ioutil.TempDir(containersMntPoint, projectPrefix(args.Container.Project(), containerName)) - if err != nil { - return err - } - defer os.RemoveAll(tmpContainerMntPoint) - - err = os.Chmod(tmpContainerMntPoint, 0700) - if err != nil { - return err - } - - containerMntPoint := getContainerMountPoint(args.Container.Project(), s.pool.Name, containerName) - err = btrfsRecv("", tmpContainerMntPoint, containerMntPoint, false, wrapper) - if err != nil { - return err - } - - if args.Live { - err = btrfsRecv("", tmpContainerMntPoint, containerMntPoint, false, wrapper) - if err != nil { - return err - } - } - - return nil -} - -func (s *storageBtrfs) btrfsLookupFsUUID(fs string) (string, error) { - output, err := shared.RunCommand( - "btrfs", - "filesystem", - "show", - "--raw", - fs) - if err != nil { - return "", fmt.Errorf("failed to detect UUID") - } - - outputString := output - idx := strings.Index(outputString, "uuid: ") - outputString = outputString[idx+6:] - outputString = strings.TrimSpace(outputString) - idx = strings.Index(outputString, "\t") - outputString = outputString[:idx] - outputString = strings.Trim(outputString, "\n") - - return outputString, nil -} - -func (s *storageBtrfs) StorageEntitySetQuota(volumeType int, size int64, data interface{}) error { - logger.Debugf(`Setting BTRFS quota for "%s"`, s.volume.Name) - - var c container - var subvol string - switch volumeType { - case storagePoolVolumeTypeContainer: - c = data.(container) - subvol = getContainerMountPoint(c.Project(), s.pool.Name, c.Name()) - case storagePoolVolumeTypeCustom: - subvol = getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - } - - qgroup, err := btrfsSubVolumeQGroup(subvol) - if err != nil { - if err != db.ErrNoSuchObject { - return err - } - - // Enable quotas - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - output, err := shared.RunCommand( - "btrfs", "quota", "enable", poolMntPoint) - if err != nil && !s.s.OS.RunningInUserNS { - return fmt.Errorf("Failed to enable quotas on BTRFS pool: %s", output) - } - } - - // Attempt to make the subvolume writable - shared.RunCommand("btrfs", "property", "set", subvol, "ro", "false") - if size > 0 { - output, err := shared.RunCommand( - "btrfs", - "qgroup", - "limit", - "-e", fmt.Sprintf("%d", size), - subvol) - - if err != nil { - return fmt.Errorf("Failed to set btrfs quota: %s", output) - } - } else if qgroup != "" { - output, err := shared.RunCommand( - "btrfs", - "qgroup", - "destroy", - qgroup, - subvol) - - if err != nil { - return fmt.Errorf("Failed to set btrfs quota: %s", output) - } - } - - logger.Debugf(`Set BTRFS quota for "%s"`, s.volume.Name) - return nil -} - -func (s *storageBtrfs) StoragePoolResources() (*api.ResourcesStoragePool, error) { - ourMount, err := s.StoragePoolMount() - if err != nil { - return nil, err - } - if ourMount { - defer s.StoragePoolUmount() - } - - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - - // Inode allocation is dynamic so no use in reporting them. - - return storageResource(poolMntPoint) -} - -func (s *storageBtrfs) StoragePoolVolumeCopy(source *api.StorageVolumeSource) error { - logger.Infof("Copying BTRFS storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) - successMsg := fmt.Sprintf("Copied BTRFS storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) - - // The storage pool needs to be mounted. - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - if s.pool.Name != source.Pool { - return s.doCrossPoolVolumeCopy(source.Pool, source.Name, source.VolumeOnly) - } - - err = s.copyVolume(source.Pool, source.Name, s.volume.Name, source.VolumeOnly) - if err != nil { - logger.Errorf("Failed to create BTRFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - if source.VolumeOnly { - logger.Infof(successMsg) - return nil - } - - subvols, err := btrfsSubVolumesGet(s.getCustomSnapshotSubvolumePath(source.Pool)) - if err != nil { - return err - } - - for _, snapOnlyName := range subvols { - snap := fmt.Sprintf("%s/%s", source.Name, snapOnlyName) - - err := s.copyVolume(source.Pool, snap, fmt.Sprintf("%s/%s", s.volume.Name, snapOnlyName), false) - if err != nil { - logger.Errorf("Failed to create BTRFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - } - - logger.Infof(successMsg) - return nil -} - -func (s *storageBtrfs) copyVolume(sourcePool string, sourceName string, targetName string, volumeOnly bool) error { - var customDir string - var srcMountPoint string - var dstMountPoint string - - isSrcSnapshot := shared.IsSnapshot(sourceName) - isDstSnapshot := shared.IsSnapshot(targetName) - - if isSrcSnapshot { - srcMountPoint = getStoragePoolVolumeSnapshotMountPoint(sourcePool, sourceName) - } else { - srcMountPoint = getStoragePoolVolumeMountPoint(sourcePool, sourceName) - } - - if isDstSnapshot { - dstMountPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, targetName) - } else { - dstMountPoint = getStoragePoolVolumeMountPoint(s.pool.Name, targetName) - } - - // Ensure that the directories immediately preceding the subvolume directory exist. - if isDstSnapshot { - volName, _, _ := containerGetParentAndSnapshotName(targetName) - customDir = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, volName) - } else { - customDir = getStoragePoolVolumeMountPoint(s.pool.Name, "") - } - - if !shared.PathExists(customDir) { - err := os.MkdirAll(customDir, customDirMode) - if err != nil { - logger.Errorf("Failed to create directory \"%s\" for storage volume \"%s\" on storage pool \"%s\": %s", customDir, s.volume.Name, s.pool.Name, err) - return err - } - } - - err := s.btrfsPoolVolumesSnapshot(srcMountPoint, dstMountPoint, false, true) - if err != nil { - logger.Errorf("Failed to create BTRFS snapshot for storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - return nil -} - -func (s *storageBtrfs) doCrossPoolVolumeCopy(sourcePool string, sourceName string, volumeOnly bool) error { - // setup storage for the source volume - srcStorage, err := storagePoolVolumeInit(s.s, "default", sourcePool, sourceName, storagePoolVolumeTypeCustom) - if err != nil { - return err - } - - ourMount, err := srcStorage.StoragePoolMount() - if err != nil { - return err - } - if ourMount { - defer srcStorage.StoragePoolUmount() - } - - err = s.StoragePoolVolumeCreate() - if err != nil { - return err - } - - destVolumeMntPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - bwlimit := s.pool.Config["rsync.bwlimit"] - - if !volumeOnly { - // Handle snapshots - snapshots, err := storagePoolVolumeSnapshotsGet(s.s, sourcePool, sourceName, storagePoolVolumeTypeCustom) - if err != nil { - return err - } - - for _, snap := range snapshots { - srcSnapshotMntPoint := getStoragePoolVolumeSnapshotMountPoint(sourcePool, snap) - - _, err = rsyncLocalCopy(srcSnapshotMntPoint, destVolumeMntPoint, bwlimit) - if err != nil { - logger.Errorf("Failed to rsync into BTRFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - // create snapshot - _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap) - - err = s.doVolumeSnapshotCreate(s.pool.Name, s.volume.Name, fmt.Sprintf("%s/%s", s.volume.Name, snapOnlyName)) - if err != nil { - return err - } - } - } - - var srcVolumeMntPoint string - - if shared.IsSnapshot(sourceName) { - // copy snapshot to volume - srcVolumeMntPoint = getStoragePoolVolumeSnapshotMountPoint(sourcePool, sourceName) - } else { - // copy volume to volume - srcVolumeMntPoint = getStoragePoolVolumeMountPoint(sourcePool, sourceName) - } - - _, err = rsyncLocalCopy(srcVolumeMntPoint, destVolumeMntPoint, bwlimit) - if err != nil { - logger.Errorf("Failed to rsync into BTRFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - return nil -} - -func (s *btrfsMigrationSourceDriver) SendStorageVolume(conn *websocket.Conn, op *operation, bwlimit string, storage storage, volumeOnly bool) error { - msg := fmt.Sprintf("Function not implemented") - logger.Errorf(msg) - return fmt.Errorf(msg) -} - -func (s *storageBtrfs) StorageMigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { - return rsyncStorageMigrationSource(args) -} - -func (s *storageBtrfs) StorageMigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { - return rsyncStorageMigrationSink(conn, op, args) -} - -func (s *storageBtrfs) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { - logger.Infof("Creating BTRFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - err := s.doVolumeSnapshotCreate(s.pool.Name, s.volume.Name, target.Name) - if err != nil { - return err - } - - logger.Infof("Created BTRFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) doVolumeSnapshotCreate(sourcePool string, sourceName string, targetName string) error { - // Create subvolume path on the storage pool. - customSubvolumePath := s.getCustomSubvolumePath(s.pool.Name) - - err := os.MkdirAll(customSubvolumePath, 0700) - if err != nil && !os.IsNotExist(err) { - return err - } - - _, _, ok := containerGetParentAndSnapshotName(targetName) - if !ok { - return err - } - - customSnapshotSubvolumeName := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - - err = os.MkdirAll(customSnapshotSubvolumeName, snapshotsDirMode) - if err != nil && !os.IsNotExist(err) { - return err - } - - sourcePath := getStoragePoolVolumeMountPoint(sourcePool, sourceName) - targetPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, targetName) - - return s.btrfsPoolVolumesSnapshot(sourcePath, targetPath, true, true) -} - -func (s *storageBtrfs) StoragePoolVolumeSnapshotDelete() error { - logger.Infof("Deleting BTRFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - snapshotSubvolumeName := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - if shared.PathExists(snapshotSubvolumeName) && isBtrfsSubVolume(snapshotSubvolumeName) { - err := btrfsSubVolumesDelete(snapshotSubvolumeName) - if err != nil { - return err - } - } - - err := os.RemoveAll(snapshotSubvolumeName) - if err != nil && !os.IsNotExist(err) { - return err - } - - sourceName, _, _ := containerGetParentAndSnapshotName(s.volume.Name) - storageVolumeSnapshotPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, sourceName) - empty, err := shared.PathIsEmpty(storageVolumeSnapshotPath) - if err == nil && empty { - err := os.RemoveAll(storageVolumeSnapshotPath) - if err != nil && !os.IsNotExist(err) { - return err - } - } - - err = s.s.Cluster.StoragePoolVolumeDelete( - "default", - s.volume.Name, - storagePoolVolumeTypeCustom, - s.poolID) - if err != nil { - logger.Errorf(`Failed to delete database entry for BTRFS storage volume "%s" on storage pool "%s"`, - s.volume.Name, s.pool.Name) - } - - logger.Infof("Deleted BTRFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageBtrfs) StoragePoolVolumeSnapshotRename(newName string) error { - logger.Infof("Renaming BTRFS storage volume on storage pool \"%s\" from \"%s\" to \"%s\"", s.pool.Name, s.volume.Name, newName) - var fullSnapshotName string - - if shared.IsSnapshot(newName) { - // When renaming volume snapshots, newName will contain the full snapshot name - fullSnapshotName = newName - } else { - sourceName, _, ok := containerGetParentAndSnapshotName(s.volume.Name) - if !ok { - return fmt.Errorf("Not a snapshot name") - } - - fullSnapshotName = fmt.Sprintf("%s%s%s", sourceName, shared.SnapshotDelimiter, newName) - } - - oldPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - newPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, fullSnapshotName) - - if !shared.PathExists(newPath) { - err := os.MkdirAll(newPath, customDirMode) - if err != nil { - return err - } - } - - err := os.Rename(oldPath, newPath) - if err != nil { - return err - } - - logger.Infof("Renamed BTRFS storage volume on storage pool \"%s\" from \"%s\" to \"%s\"", s.pool.Name, s.volume.Name, newName) - - return s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, fullSnapshotName, storagePoolVolumeTypeCustom, s.poolID) -} From c368aa9aa81d120176d1a051911594d458e1ae19 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 16:52:12 +0200 Subject: [PATCH 14/15] lxd: Remove old dir storage code Signed-off-by: Thomas Hipp --- lxd/storage_dir.go | 1587 -------------------------------------------- 1 file changed, 1587 deletions(-) delete mode 100644 lxd/storage_dir.go diff --git a/lxd/storage_dir.go b/lxd/storage_dir.go deleted file mode 100644 index ca1b0a6949..0000000000 --- a/lxd/storage_dir.go +++ /dev/null @@ -1,1587 +0,0 @@ -package main - -import ( - "fmt" - "io" - "io/ioutil" - "os" - "path/filepath" - "strings" - "syscall" - - "github.com/gorilla/websocket" - "github.com/pkg/errors" - - "github.com/lxc/lxd/lxd/migration" - "github.com/lxc/lxd/lxd/storage/quota" - "github.com/lxc/lxd/shared" - "github.com/lxc/lxd/shared/api" - "github.com/lxc/lxd/shared/ioprogress" - "github.com/lxc/lxd/shared/logger" -) - -type storageDir struct { - storageShared - - volumeID int64 -} - -// Only initialize the minimal information we need about a given storage type. -func (s *storageDir) StorageCoreInit() error { - s.sType = storageTypeDir - typeName, err := storageTypeToString(s.sType) - if err != nil { - return err - } - s.sTypeName = typeName - s.sTypeVersion = "1" - - return nil -} - -// Initialize a full storage interface. -func (s *storageDir) StoragePoolInit() error { - err := s.StorageCoreInit() - if err != nil { - return err - } - - return nil -} - -// Initialize a full storage interface. -func (s *storageDir) StoragePoolCheck() error { - logger.Debugf("Checking DIR storage pool \"%s\"", s.pool.Name) - return nil -} - -func (s *storageDir) StoragePoolCreate() error { - logger.Infof("Creating DIR storage pool \"%s\"", s.pool.Name) - - s.pool.Config["volatile.initial_source"] = s.pool.Config["source"] - - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - - source := shared.HostPath(s.pool.Config["source"]) - if source == "" { - source = filepath.Join(shared.VarPath("storage-pools"), s.pool.Name) - s.pool.Config["source"] = source - } else { - cleanSource := filepath.Clean(source) - lxdDir := shared.VarPath() - if strings.HasPrefix(cleanSource, lxdDir) && - cleanSource != poolMntPoint { - return fmt.Errorf(`DIR storage pool requests in LXD `+ - `directory "%s" are only valid under `+ - `"%s"\n(e.g. source=%s)`, shared.VarPath(), - shared.VarPath("storage-pools"), poolMntPoint) - } - source = filepath.Clean(source) - } - - revert := true - if !shared.PathExists(source) { - err := os.MkdirAll(source, 0711) - if err != nil { - return err - } - defer func() { - if !revert { - return - } - os.Remove(source) - }() - } else { - empty, err := shared.PathIsEmpty(source) - if err != nil { - return err - } - - if !empty { - return fmt.Errorf("The provided directory is not empty") - } - } - - if !shared.PathExists(poolMntPoint) { - err := os.MkdirAll(poolMntPoint, 0711) - if err != nil { - return err - } - defer func() { - if !revert { - return - } - os.Remove(poolMntPoint) - }() - } - - err := s.StoragePoolCheck() - if err != nil { - return err - } - - _, err = s.StoragePoolMount() - if err != nil { - return err - } - - revert = false - - logger.Infof("Created DIR storage pool \"%s\"", s.pool.Name) - return nil -} - -func (s *storageDir) StoragePoolDelete() error { - logger.Infof("Deleting DIR storage pool \"%s\"", s.pool.Name) - - source := shared.HostPath(s.pool.Config["source"]) - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - _, err := s.StoragePoolUmount() - if err != nil { - return err - } - - if shared.PathExists(source) { - err := os.RemoveAll(source) - if err != nil { - return err - } - } - - prefix := shared.VarPath("storage-pools") - if !strings.HasPrefix(source, prefix) { - storagePoolSymlink := getStoragePoolMountPoint(s.pool.Name) - if !shared.PathExists(storagePoolSymlink) { - return nil - } - - err := os.Remove(storagePoolSymlink) - if err != nil { - return err - } - } - - logger.Infof("Deleted DIR storage pool \"%s\"", s.pool.Name) - return nil -} - -func (s *storageDir) StoragePoolMount() (bool, error) { - source := shared.HostPath(s.pool.Config["source"]) - if source == "" { - return false, fmt.Errorf("no \"source\" property found for the storage pool") - } - cleanSource := filepath.Clean(source) - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - if cleanSource == poolMntPoint { - return true, nil - } - - logger.Debugf("Mounting DIR storage pool \"%s\"", s.pool.Name) - - poolMountLockID := getPoolMountLockID(s.pool.Name) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[poolMountLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - // Give the benefit of the doubt and assume that the other - // thread actually succeeded in mounting the storage pool. - return false, nil - } - - lxdStorageOngoingOperationMap[poolMountLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - removeLockFromMap := func() { - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[poolMountLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, poolMountLockID) - } - lxdStorageMapLock.Unlock() - } - defer removeLockFromMap() - - mountSource := cleanSource - mountFlags := syscall.MS_BIND - - if shared.IsMountPoint(poolMntPoint) { - return false, nil - } - - err := syscall.Mount(mountSource, poolMntPoint, "", uintptr(mountFlags), "") - if err != nil { - logger.Errorf(`Failed to mount DIR storage pool "%s" onto "%s": %s`, mountSource, poolMntPoint, err) - return false, err - } - - logger.Debugf("Mounted DIR storage pool \"%s\"", s.pool.Name) - - return true, nil -} - -func (s *storageDir) StoragePoolUmount() (bool, error) { - source := s.pool.Config["source"] - if source == "" { - return false, fmt.Errorf("no \"source\" property found for the storage pool") - } - cleanSource := filepath.Clean(source) - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - if cleanSource == poolMntPoint { - return true, nil - } - - logger.Debugf("Unmounting DIR storage pool \"%s\"", s.pool.Name) - - poolUmountLockID := getPoolUmountLockID(s.pool.Name) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[poolUmountLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - // Give the benefit of the doubt and assume that the other - // thread actually succeeded in unmounting the storage pool. - return false, nil - } - - lxdStorageOngoingOperationMap[poolUmountLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - removeLockFromMap := func() { - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[poolUmountLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, poolUmountLockID) - } - lxdStorageMapLock.Unlock() - } - - defer removeLockFromMap() - - if !shared.IsMountPoint(poolMntPoint) { - return false, nil - } - - err := syscall.Unmount(poolMntPoint, 0) - if err != nil { - return false, err - } - - logger.Debugf("Unmounted DIR pool \"%s\"", s.pool.Name) - return true, nil -} - -func (s *storageDir) GetContainerPoolInfo() (int64, string, string) { - return s.poolID, s.pool.Name, s.pool.Name -} - -func (s *storageDir) StoragePoolUpdate(writable *api.StoragePoolPut, changedConfig []string) error { - logger.Infof(`Updating DIR storage pool "%s"`, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - changeable := changeableStoragePoolProperties["dir"] - unchangeable := []string{} - for _, change := range changedConfig { - if !shared.StringInSlice(change, changeable) { - unchangeable = append(unchangeable, change) - } - } - - if len(unchangeable) > 0 { - return updateStoragePoolError(unchangeable, "dir") - } - - // "rsync.bwlimit" requires no on-disk modifications. - - logger.Infof(`Updated DIR storage pool "%s"`, s.pool.Name) - return nil -} - -// Functions dealing with storage pools. -func (s *storageDir) StoragePoolVolumeCreate() error { - logger.Infof("Creating DIR storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - isSnapshot := shared.IsSnapshot(s.volume.Name) - - var storageVolumePath string - - if isSnapshot { - storageVolumePath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - } else { - storageVolumePath = getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - } - - err = os.MkdirAll(storageVolumePath, 0711) - if err != nil { - return err - } - - err = s.initQuota(storageVolumePath, s.volumeID) - if err != nil { - return err - } - - logger.Infof("Created DIR storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) StoragePoolVolumeDelete() error { - logger.Infof("Deleting DIR storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - storageVolumePath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - if !shared.PathExists(storageVolumePath) { - return nil - } - - err := s.deleteQuota(storageVolumePath, s.volumeID) - if err != nil { - return err - } - - err = os.RemoveAll(storageVolumePath) - if err != nil { - return err - } - - err = s.s.Cluster.StoragePoolVolumeDelete( - "default", - s.volume.Name, - storagePoolVolumeTypeCustom, - s.poolID) - if err != nil { - logger.Errorf(`Failed to delete database entry for DIR storage volume "%s" on storage pool "%s"`, - s.volume.Name, s.pool.Name) - } - - logger.Infof("Deleted DIR storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) StoragePoolVolumeMount() (bool, error) { - return true, nil -} - -func (s *storageDir) StoragePoolVolumeUmount() (bool, error) { - return true, nil -} - -func (s *storageDir) StoragePoolVolumeUpdate(writable *api.StorageVolumePut, changedConfig []string) error { - if writable.Restore == "" { - logger.Infof(`Updating DIR storage volume "%s"`, s.volume.Name) - } - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - if writable.Restore != "" { - logger.Infof(`Restoring DIR storage volume "%s" from snapshot "%s"`, - s.volume.Name, writable.Restore) - - sourcePath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, - fmt.Sprintf("%s/%s", s.volume.Name, writable.Restore)) - targetPath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - - // Restore using rsync - bwlimit := s.pool.Config["rsync.bwlimit"] - output, err := rsyncLocalCopy(sourcePath, targetPath, bwlimit) - if err != nil { - return fmt.Errorf("failed to rsync container: %s: %s", string(output), err) - } - - logger.Infof(`Restored DIR storage volume "%s" from snapshot "%s"`, - s.volume.Name, writable.Restore) - return nil - } - - changeable := changeableStoragePoolVolumeProperties["dir"] - unchangeable := []string{} - for _, change := range changedConfig { - if !shared.StringInSlice(change, changeable) { - unchangeable = append(unchangeable, change) - } - } - - if len(unchangeable) > 0 { - return updateStoragePoolVolumeError(unchangeable, "dir") - } - - logger.Infof(`Updated DIR storage volume "%s"`, s.volume.Name) - return nil -} - -func (s *storageDir) StoragePoolVolumeRename(newName string) error { - logger.Infof(`Renaming DIR storage volume on storage pool "%s" from "%s" to "%s`, - s.pool.Name, s.volume.Name, newName) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - usedBy, err := storagePoolVolumeUsedByContainersGet(s.s, "default", s.volume.Name, storagePoolVolumeTypeNameCustom) - if err != nil { - return err - } - if len(usedBy) > 0 { - return fmt.Errorf(`DIR storage volume "%s" on storage pool "%s" is attached to containers`, - s.volume.Name, s.pool.Name) - } - - oldPath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - newPath := getStoragePoolVolumeMountPoint(s.pool.Name, newName) - err = os.Rename(oldPath, newPath) - if err != nil { - return err - } - - logger.Infof(`Renamed DIR storage volume on storage pool "%s" from "%s" to "%s`, - s.pool.Name, s.volume.Name, newName) - - return s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, newName, - storagePoolVolumeTypeCustom, s.poolID) -} - -func (s *storageDir) ContainerStorageReady(container container) bool { - containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - ok, _ := shared.PathIsEmpty(containerMntPoint) - return !ok -} - -func (s *storageDir) ContainerCreate(container container) error { - logger.Debugf("Creating empty DIR storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - err = createContainerMountpoint(containerMntPoint, container.Path(), container.IsPrivileged()) - if err != nil { - return err - } - revert := true - defer func() { - if !revert { - return - } - deleteContainerMountpoint(containerMntPoint, container.Path(), s.GetStorageTypeName()) - }() - - err = s.initQuota(containerMntPoint, s.volumeID) - if err != nil { - return err - } - - err = container.TemplateApply("create") - if err != nil { - return errors.Wrap(err, "Apply template") - } - - revert = false - - logger.Debugf("Created empty DIR storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) ContainerCreateFromImage(container container, imageFingerprint string, tracker *ioprogress.ProgressTracker) error { - logger.Debugf("Creating DIR storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - privileged := container.IsPrivileged() - containerName := container.Name() - containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, containerName) - err = createContainerMountpoint(containerMntPoint, container.Path(), privileged) - if err != nil { - return errors.Wrap(err, "Create container mount point") - } - revert := true - defer func() { - if !revert { - return - } - s.ContainerDelete(container) - }() - - err = s.initQuota(containerMntPoint, s.volumeID) - if err != nil { - return err - } - - imagePath := shared.VarPath("images", imageFingerprint) - err = unpackImage(imagePath, containerMntPoint, storageTypeDir, s.s.OS.RunningInUserNS, nil) - if err != nil { - return errors.Wrap(err, "Unpack image") - } - - err = container.TemplateApply("create") - if err != nil { - return errors.Wrap(err, "Apply template") - } - - revert = false - - logger.Debugf("Created DIR storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) ContainerDelete(container container) error { - logger.Debugf("Deleting DIR storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Delete the container on its storage pool: - // ${POOL}/containers/ - containerName := container.Name() - containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, containerName) - - err = s.deleteQuota(containerMntPoint, s.volumeID) - if err != nil { - return err - } - - if shared.PathExists(containerMntPoint) { - err := os.RemoveAll(containerMntPoint) - if err != nil { - // RemovaAll fails on very long paths, so attempt an rm -Rf - output, err := shared.RunCommand("rm", "-Rf", containerMntPoint) - if err != nil { - return fmt.Errorf("error removing %s: %s", containerMntPoint, output) - } - } - } - - err = deleteContainerMountpoint(containerMntPoint, container.Path(), s.GetStorageTypeName()) - if err != nil { - return err - } - - // Delete potential leftover snapshot mountpoints. - snapshotMntPoint := getSnapshotMountPoint(container.Project(), s.pool.Name, container.Name()) - if shared.PathExists(snapshotMntPoint) { - err := os.RemoveAll(snapshotMntPoint) - if err != nil { - return err - } - } - - // Delete potential leftover snapshot symlinks: - // ${LXD_DIR}/snapshots/ to ${POOL}/snapshots/ - snapshotSymlink := shared.VarPath("snapshots", projectPrefix(container.Project(), container.Name())) - if shared.PathExists(snapshotSymlink) { - err := os.Remove(snapshotSymlink) - if err != nil { - return err - } - } - - logger.Debugf("Deleted DIR storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) copyContainer(target container, source container) error { - _, sourcePool, _ := source.Storage().GetContainerPoolInfo() - _, targetPool, _ := target.Storage().GetContainerPoolInfo() - sourceContainerMntPoint := getContainerMountPoint(source.Project(), sourcePool, source.Name()) - if source.IsSnapshot() { - sourceContainerMntPoint = getSnapshotMountPoint(source.Project(), sourcePool, source.Name()) - } - targetContainerMntPoint := getContainerMountPoint(target.Project(), targetPool, target.Name()) - - err := createContainerMountpoint(targetContainerMntPoint, target.Path(), target.IsPrivileged()) - if err != nil { - return err - } - - err = s.initQuota(targetContainerMntPoint, s.volumeID) - if err != nil { - return err - } - - bwlimit := s.pool.Config["rsync.bwlimit"] - output, err := rsyncLocalCopy(sourceContainerMntPoint, targetContainerMntPoint, bwlimit) - if err != nil { - return fmt.Errorf("failed to rsync container: %s: %s", string(output), err) - } - - err = target.TemplateApply("copy") - if err != nil { - return err - } - - return nil -} - -func (s *storageDir) copySnapshot(target container, targetPool string, source container, sourcePool string) error { - sourceName := source.Name() - targetName := target.Name() - sourceContainerMntPoint := getSnapshotMountPoint(source.Project(), sourcePool, sourceName) - targetContainerMntPoint := getSnapshotMountPoint(target.Project(), targetPool, targetName) - - targetParentName, _, _ := containerGetParentAndSnapshotName(target.Name()) - containersPath := getSnapshotMountPoint(target.Project(), targetPool, targetParentName) - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", targetPool, "containers-snapshots", projectPrefix(target.Project(), targetParentName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(target.Project(), targetParentName)) - err := createSnapshotMountpoint(containersPath, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - - bwlimit := s.pool.Config["rsync.bwlimit"] - output, err := rsyncLocalCopy(sourceContainerMntPoint, targetContainerMntPoint, bwlimit) - if err != nil { - return fmt.Errorf("failed to rsync container: %s: %s", string(output), err) - } - - return nil -} - -func (s *storageDir) ContainerCopy(target container, source container, containerOnly bool) error { - logger.Debugf("Copying DIR container storage %s to %s", source.Name(), target.Name()) - - err := s.doContainerCopy(target, source, containerOnly, false, nil) - if err != nil { - return err - } - - logger.Debugf("Copied DIR container storage %s to %s", source.Name(), target.Name()) - return nil -} - -func (s *storageDir) doContainerCopy(target container, source container, containerOnly bool, refresh bool, refreshSnapshots []container) error { - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - ourStart, err := source.StorageStart() - if err != nil { - return err - } - if ourStart { - defer source.StorageStop() - } - - sourcePool, err := source.StoragePool() - if err != nil { - return err - } - targetPool, err := target.StoragePool() - if err != nil { - return err - } - - srcState := s.s - if sourcePool != targetPool { - // setup storage for the source volume - srcStorage, err := storagePoolVolumeInit(s.s, "default", sourcePool, source.Name(), storagePoolVolumeTypeContainer) - if err != nil { - return err - } - - ourMount, err := srcStorage.StoragePoolMount() - if err != nil { - return err - } - if ourMount { - defer srcStorage.StoragePoolUmount() - } - srcState = srcStorage.GetState() - } - - err = s.copyContainer(target, source) - if err != nil { - return err - } - - if containerOnly { - return nil - } - - var snapshots []container - - if refresh { - snapshots = refreshSnapshots - } else { - snapshots, err = source.Snapshots() - if err != nil { - return err - } - } - - if len(snapshots) == 0 { - return nil - } - - for _, snap := range snapshots { - sourceSnapshot, err := containerLoadByProjectAndName(srcState, source.Project(), snap.Name()) - if err != nil { - return err - } - - _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap.Name()) - newSnapName := fmt.Sprintf("%s/%s", target.Name(), snapOnlyName) - targetSnapshot, err := containerLoadByProjectAndName(s.s, source.Project(), newSnapName) - if err != nil { - return err - } - - err = s.copySnapshot(targetSnapshot, targetPool, sourceSnapshot, sourcePool) - if err != nil { - return err - } - } - - return nil -} - -func (s *storageDir) ContainerRefresh(target container, source container, snapshots []container) error { - logger.Debugf("Refreshing DIR container storage for %s from %s", target.Name(), source.Name()) - - err := s.doContainerCopy(target, source, len(snapshots) == 0, true, snapshots) - if err != nil { - return err - } - - logger.Debugf("Refreshed DIR container storage for %s from %s", target.Name(), source.Name()) - return nil -} - -func (s *storageDir) ContainerMount(c container) (bool, error) { - return s.StoragePoolMount() -} - -func (s *storageDir) ContainerUmount(c container, path string) (bool, error) { - return true, nil -} - -func (s *storageDir) ContainerRename(container container, newName string) error { - logger.Debugf("Renaming DIR storage volume for container \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - oldContainerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - oldContainerSymlink := containerPath(container.Project(), container.Name(), false) - newContainerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, newName) - newContainerSymlink := containerPath(container.Project(), newName, false) - - err = renameContainerMountpoint(oldContainerMntPoint, oldContainerSymlink, newContainerMntPoint, newContainerSymlink) - if err != nil { - return err - } - - // Rename the snapshot mountpoint for the container if existing: - // ${POOL}/snapshots/ to ${POOL}/snapshots/ - oldSnapshotsMntPoint := getSnapshotMountPoint(container.Project(), s.pool.Name, container.Name()) - newSnapshotsMntPoint := getSnapshotMountPoint(container.Project(), s.pool.Name, newName) - if shared.PathExists(oldSnapshotsMntPoint) { - err = os.Rename(oldSnapshotsMntPoint, newSnapshotsMntPoint) - if err != nil { - return err - } - } - - // Remove the old snapshot symlink: - // ${LXD_DIR}/snapshots/ - oldSnapshotSymlink := shared.VarPath("snapshots", projectPrefix(container.Project(), container.Name())) - newSnapshotSymlink := shared.VarPath("snapshots", projectPrefix(container.Project(), newName)) - if shared.PathExists(oldSnapshotSymlink) { - err := os.Remove(oldSnapshotSymlink) - if err != nil { - return err - } - - // Create the new snapshot symlink: - // ${LXD_DIR}/snapshots/ to ${POOL}/snapshots/ - err = os.Symlink(newSnapshotsMntPoint, newSnapshotSymlink) - if err != nil { - return err - } - } - - logger.Debugf("Renamed DIR storage volume for container \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - return nil -} - -func (s *storageDir) ContainerRestore(container container, sourceContainer container) error { - logger.Debugf("Restoring DIR storage volume for container \"%s\" from %s to %s", s.volume.Name, sourceContainer.Name(), container.Name()) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - targetPath := container.Path() - sourcePath := sourceContainer.Path() - - // Restore using rsync - bwlimit := s.pool.Config["rsync.bwlimit"] - output, err := rsyncLocalCopy(sourcePath, targetPath, bwlimit) - if err != nil { - return fmt.Errorf("failed to rsync container: %s: %s", string(output), err) - } - - logger.Debugf("Restored DIR storage volume for container \"%s\" from %s to %s", s.volume.Name, sourceContainer.Name(), container.Name()) - return nil -} - -func (s *storageDir) ContainerGetUsage(c container) (int64, error) { - path := getContainerMountPoint(c.Project(), s.pool.Name, c.Name()) - - ok, err := quota.Supported(path) - if err != nil || !ok { - return -1, fmt.Errorf("The backing filesystem doesn't support quotas") - } - - projectID := uint32(s.volumeID + 10000) - size, err := quota.GetProjectUsage(path, projectID) - if err != nil { - return -1, err - } - - return size, nil -} - -func (s *storageDir) ContainerSnapshotCreate(snapshotContainer container, sourceContainer container) error { - logger.Debugf("Creating DIR storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Create the path for the snapshot. - targetContainerName := snapshotContainer.Name() - targetContainerMntPoint := getSnapshotMountPoint(sourceContainer.Project(), s.pool.Name, targetContainerName) - err = os.MkdirAll(targetContainerMntPoint, 0711) - if err != nil { - return err - } - - rsync := func(snapshotContainer container, oldPath string, newPath string, bwlimit string) error { - output, err := rsyncLocalCopy(oldPath, newPath, bwlimit) - if err != nil { - s.ContainerDelete(snapshotContainer) - return fmt.Errorf("failed to rsync: %s: %s", string(output), err) - } - return nil - } - - ourStart, err := sourceContainer.StorageStart() - if err != nil { - return err - } - if ourStart { - defer sourceContainer.StorageStop() - } - - _, sourcePool, _ := sourceContainer.Storage().GetContainerPoolInfo() - sourceContainerName := sourceContainer.Name() - sourceContainerMntPoint := getContainerMountPoint(sourceContainer.Project(), sourcePool, sourceContainerName) - bwlimit := s.pool.Config["rsync.bwlimit"] - err = rsync(snapshotContainer, sourceContainerMntPoint, targetContainerMntPoint, bwlimit) - if err != nil { - return err - } - - if sourceContainer.IsRunning() { - // This is done to ensure consistency when snapshotting. But we - // probably shouldn't fail just because of that. - logger.Debugf("Trying to freeze and rsync again to ensure consistency") - - err := sourceContainer.Freeze() - if err != nil { - logger.Errorf("Trying to freeze and rsync again failed") - goto onSuccess - } - defer sourceContainer.Unfreeze() - - err = rsync(snapshotContainer, sourceContainerMntPoint, targetContainerMntPoint, bwlimit) - if err != nil { - return err - } - } - -onSuccess: - // Check if the symlink - // ${LXD_DIR}/snapshots/ to ${POOL_PATH}/snapshots/ - // exists and if not create it. - sourceContainerSymlink := shared.VarPath("snapshots", projectPrefix(sourceContainer.Project(), sourceContainerName)) - sourceContainerSymlinkTarget := getSnapshotMountPoint(sourceContainer.Project(), sourcePool, sourceContainerName) - if !shared.PathExists(sourceContainerSymlink) { - err = os.Symlink(sourceContainerSymlinkTarget, sourceContainerSymlink) - if err != nil { - return err - } - } - - logger.Debugf("Created DIR storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) ContainerSnapshotCreateEmpty(snapshotContainer container) error { - logger.Debugf("Creating empty DIR storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Create the path for the snapshot. - targetContainerName := snapshotContainer.Name() - targetContainerMntPoint := getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, targetContainerName) - err = os.MkdirAll(targetContainerMntPoint, 0711) - if err != nil { - return err - } - revert := true - defer func() { - if !revert { - return - } - s.ContainerSnapshotDelete(snapshotContainer) - }() - - // Check if the symlink - // ${LXD_DIR}/snapshots/ to ${POOL_PATH}/snapshots/ - // exists and if not create it. - targetContainerMntPoint = getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, - targetContainerName) - sourceName, _, _ := containerGetParentAndSnapshotName(targetContainerName) - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", - s.pool.Name, "containers-snapshots", projectPrefix(snapshotContainer.Project(), sourceName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(snapshotContainer.Project(), sourceName)) - err = createSnapshotMountpoint(targetContainerMntPoint, - snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - - revert = false - - logger.Debugf("Created empty DIR storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func dirSnapshotDeleteInternal(project, poolName string, snapshotName string) error { - snapshotContainerMntPoint := getSnapshotMountPoint(project, poolName, snapshotName) - if shared.PathExists(snapshotContainerMntPoint) { - err := os.RemoveAll(snapshotContainerMntPoint) - if err != nil { - return err - } - } - - sourceContainerName, _, _ := containerGetParentAndSnapshotName(snapshotName) - snapshotContainerPath := getSnapshotMountPoint(project, poolName, sourceContainerName) - empty, _ := shared.PathIsEmpty(snapshotContainerPath) - if empty == true { - err := os.Remove(snapshotContainerPath) - if err != nil { - return err - } - - snapshotSymlink := shared.VarPath("snapshots", projectPrefix(project, sourceContainerName)) - if shared.PathExists(snapshotSymlink) { - err := os.Remove(snapshotSymlink) - if err != nil { - return err - } - } - } - - return nil -} - -func (s *storageDir) ContainerSnapshotDelete(snapshotContainer container) error { - logger.Debugf("Deleting DIR storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - snapshotContainerName := snapshotContainer.Name() - err = dirSnapshotDeleteInternal(snapshotContainer.Project(), s.pool.Name, snapshotContainerName) - if err != nil { - return err - } - - logger.Debugf("Deleted DIR storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) ContainerSnapshotRename(snapshotContainer container, newName string) error { - logger.Debugf("Renaming DIR storage volume for snapshot \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - // Rename the mountpoint for the snapshot: - // ${POOL}/snapshots/ to ${POOL}/snapshots/ - oldSnapshotMntPoint := getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, snapshotContainer.Name()) - newSnapshotMntPoint := getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, newName) - err = os.Rename(oldSnapshotMntPoint, newSnapshotMntPoint) - if err != nil { - return err - } - - logger.Debugf("Renamed DIR storage volume for snapshot \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - return nil -} - -func (s *storageDir) ContainerSnapshotStart(container container) (bool, error) { - return s.StoragePoolMount() -} - -func (s *storageDir) ContainerSnapshotStop(container container) (bool, error) { - return true, nil -} - -func (s *storageDir) ContainerBackupCreate(backup backup, source container) error { - // Start storage - ourStart, err := source.StorageStart() - if err != nil { - return err - } - if ourStart { - defer source.StorageStop() - } - - // Create a temporary path for the backup - tmpPath, err := ioutil.TempDir(shared.VarPath("backups"), "lxd_backup_") - if err != nil { - return err - } - defer os.RemoveAll(tmpPath) - - // Prepare for rsync - rsync := func(oldPath string, newPath string, bwlimit string) error { - output, err := rsyncLocalCopy(oldPath, newPath, bwlimit) - if err != nil { - return fmt.Errorf("Failed to rsync: %s: %s", string(output), err) - } - - return nil - } - - bwlimit := s.pool.Config["rsync.bwlimit"] - - // Handle snapshots - if !backup.containerOnly { - snapshotsPath := fmt.Sprintf("%s/snapshots", tmpPath) - - // Retrieve the snapshots - snapshots, err := source.Snapshots() - if err != nil { - return err - } - - // Create the snapshot path - if len(snapshots) > 0 { - err = os.MkdirAll(snapshotsPath, 0711) - if err != nil { - return err - } - } - - for _, snap := range snapshots { - _, snapName, _ := containerGetParentAndSnapshotName(snap.Name()) - snapshotMntPoint := getSnapshotMountPoint(snap.Project(), s.pool.Name, snap.Name()) - target := fmt.Sprintf("%s/%s", snapshotsPath, snapName) - - // Copy the snapshot - err = rsync(snapshotMntPoint, target, bwlimit) - if err != nil { - return err - } - } - } - - if source.IsRunning() { - // This is done to ensure consistency when snapshotting. But we - // probably shouldn't fail just because of that. - logger.Debugf("Freezing container '%s' for backup", source.Name()) - - err := source.Freeze() - if err != nil { - logger.Errorf("Failed to freeze container '%s' for backup: %v", source.Name(), err) - } - defer source.Unfreeze() - } - - // Copy the container - containerPath := fmt.Sprintf("%s/container", tmpPath) - err = rsync(source.Path(), containerPath, bwlimit) - if err != nil { - return err - } - - // Pack the backup - err = backupCreateTarball(s.s, tmpPath, backup) - if err != nil { - return err - } - - return nil -} - -func (s *storageDir) ContainerBackupLoad(info backupInfo, data io.ReadSeeker, tarArgs []string) error { - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - // Create mountpoints - containerMntPoint := getContainerMountPoint(info.Project, s.pool.Name, info.Name) - err = createContainerMountpoint(containerMntPoint, containerPath(info.Project, info.Name, false), info.Privileged) - if err != nil { - return errors.Wrap(err, "Create container mount point") - } - - // Prepare tar arguments - args := append(tarArgs, []string{ - "-", - "--strip-components=2", - "--xattrs-include=*", - "-C", containerMntPoint, "backup/container", - }...) - - // Extract container - data.Seek(0, 0) - err = shared.RunCommandWithFds(data, nil, "tar", args...) - if err != nil { - return err - } - - if len(info.Snapshots) > 0 { - // Create mountpoints - snapshotMntPoint := getSnapshotMountPoint(info.Project, s.pool.Name, info.Name) - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, - "containers-snapshots", projectPrefix(info.Project, info.Name)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(info.Project, info.Name)) - err := createSnapshotMountpoint(snapshotMntPoint, snapshotMntPointSymlinkTarget, - snapshotMntPointSymlink) - if err != nil { - return err - } - - // Prepare tar arguments - args := append(tarArgs, []string{ - "-", - "--strip-components=2", - "--xattrs-include=*", - "-C", snapshotMntPoint, "backup/snapshots", - }...) - - // Extract snapshots - data.Seek(0, 0) - err = shared.RunCommandWithFds(data, nil, "tar", args...) - if err != nil { - return err - } - } - - return nil -} - -func (s *storageDir) ImageCreate(fingerprint string, tracker *ioprogress.ProgressTracker) error { - return nil -} - -func (s *storageDir) ImageDelete(fingerprint string) error { - err := s.deleteImageDbPoolVolume(fingerprint) - if err != nil { - return err - } - - return nil -} - -func (s *storageDir) ImageMount(fingerprint string) (bool, error) { - return true, nil -} - -func (s *storageDir) ImageUmount(fingerprint string) (bool, error) { - return true, nil -} - -func (s *storageDir) MigrationType() migration.MigrationFSType { - return migration.MigrationFSType_RSYNC -} - -func (s *storageDir) PreservesInodes() bool { - return false -} - -func (s *storageDir) MigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { - return rsyncMigrationSource(args) -} - -func (s *storageDir) MigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { - return rsyncMigrationSink(conn, op, args) -} - -func (s *storageDir) StorageEntitySetQuota(volumeType int, size int64, data interface{}) error { - var path string - switch volumeType { - case storagePoolVolumeTypeContainer: - c := data.(container) - path = getContainerMountPoint(c.Project(), s.pool.Name, c.Name()) - case storagePoolVolumeTypeCustom: - path = getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - } - - ok, err := quota.Supported(path) - if err != nil || !ok { - logger.Warnf("Skipping setting disk quota for '%s' as the underlying filesystem doesn't support them", s.volume.Name) - return nil - } - - projectID := uint32(s.volumeID + 10000) - err = quota.SetProjectQuota(path, projectID, size) - if err != nil { - return err - } - - return nil -} - -func (s *storageDir) initQuota(path string, id int64) error { - if s.volumeID == 0 { - return fmt.Errorf("Missing volume ID") - } - - ok, err := quota.Supported(path) - if err != nil || !ok { - return nil - } - - projectID := uint32(s.volumeID + 10000) - err = quota.SetProject(path, projectID) - if err != nil { - return err - } - - return nil -} - -func (s *storageDir) deleteQuota(path string, id int64) error { - if s.volumeID == 0 { - return fmt.Errorf("Missing volume ID") - } - - ok, err := quota.Supported(path) - if err != nil || !ok { - return nil - } - - err = quota.SetProject(path, 0) - if err != nil { - return err - } - - projectID := uint32(s.volumeID + 10000) - err = quota.SetProjectQuota(path, projectID, 0) - if err != nil { - return err - } - - return nil -} - -func (s *storageDir) StoragePoolResources() (*api.ResourcesStoragePool, error) { - _, err := s.StoragePoolMount() - if err != nil { - return nil, err - } - - poolMntPoint := getStoragePoolMountPoint(s.pool.Name) - - return storageResource(poolMntPoint) -} - -func (s *storageDir) StoragePoolVolumeCopy(source *api.StorageVolumeSource) error { - logger.Infof("Copying DIR storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) - successMsg := fmt.Sprintf("Copied DIR storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) - - if s.pool.Name != source.Pool { - // setup storage for the source volume - srcStorage, err := storagePoolVolumeInit(s.s, "default", source.Pool, source.Name, storagePoolVolumeTypeCustom) - if err != nil { - logger.Errorf("Failed to initialize DIR storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - ourMount, err := srcStorage.StoragePoolMount() - if err != nil { - logger.Errorf("Failed to mount DIR storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - if ourMount { - defer srcStorage.StoragePoolUmount() - } - } - - err := s.copyVolume(source.Pool, source.Name, s.volume.Name) - if err != nil { - return err - } - - if source.VolumeOnly { - logger.Infof(successMsg) - return nil - } - - snapshots, err := storagePoolVolumeSnapshotsGet(s.s, source.Pool, source.Name, storagePoolVolumeTypeCustom) - if err != nil { - return err - } - - for _, snap := range snapshots { - _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap) - err = s.copyVolumeSnapshot(source.Pool, snap, fmt.Sprintf("%s/%s", s.volume.Name, snapOnlyName)) - if err != nil { - return err - } - } - - logger.Infof(successMsg) - return nil -} - -func (s *storageDir) StorageMigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { - return rsyncStorageMigrationSource(args) -} - -func (s *storageDir) StorageMigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { - return rsyncStorageMigrationSink(conn, op, args) -} - -func (s *storageDir) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { - logger.Infof("Creating DIR storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - _, err := s.StoragePoolMount() - if err != nil { - return err - } - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - sourceName, _, ok := containerGetParentAndSnapshotName(target.Name) - if !ok { - return fmt.Errorf("Not a snapshot name") - } - - targetPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target.Name) - err = os.MkdirAll(targetPath, 0711) - if err != nil { - return err - } - - sourcePath := getStoragePoolVolumeMountPoint(s.pool.Name, sourceName) - bwlimit := s.pool.Config["rsync.bwlimit"] - msg, err := rsyncLocalCopy(sourcePath, targetPath, bwlimit) - if err != nil { - return fmt.Errorf("Failed to rsync: %s: %s", string(msg), err) - } - - logger.Infof("Created DIR storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) StoragePoolVolumeSnapshotDelete() error { - logger.Infof("Deleting DIR storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - storageVolumePath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - err := os.RemoveAll(storageVolumePath) - if err != nil && !os.IsNotExist(err) { - return err - } - - sourceName, _, _ := containerGetParentAndSnapshotName(s.volume.Name) - storageVolumeSnapshotPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, sourceName) - empty, err := shared.PathIsEmpty(storageVolumeSnapshotPath) - if err == nil && empty { - os.RemoveAll(storageVolumeSnapshotPath) - } - - err = s.s.Cluster.StoragePoolVolumeDelete( - "default", - s.volume.Name, - storagePoolVolumeTypeCustom, - s.poolID) - if err != nil { - logger.Errorf(`Failed to delete database entry for DIR storage volume "%s" on storage pool "%s"`, - s.volume.Name, s.pool.Name) - } - - logger.Infof("Deleted DIR storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageDir) StoragePoolVolumeSnapshotRename(newName string) error { - logger.Infof("Renaming DIR storage volume on storage pool \"%s\" from \"%s\" to \"%s\"", s.pool.Name, s.volume.Name, newName) - var fullSnapshotName string - - if shared.IsSnapshot(newName) { - // When renaming volume snapshots, newName will contain the full snapshot name - fullSnapshotName = newName - } else { - sourceName, _, ok := containerGetParentAndSnapshotName(s.volume.Name) - if !ok { - return fmt.Errorf("Not a snapshot name") - } - - fullSnapshotName = fmt.Sprintf("%s%s%s", sourceName, shared.SnapshotDelimiter, newName) - } - - oldPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - newPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, fullSnapshotName) - - if !shared.PathExists(newPath) { - err := os.MkdirAll(newPath, customDirMode) - if err != nil { - return err - } - } - - err := os.Rename(oldPath, newPath) - if err != nil { - return err - } - - logger.Infof("Renamed DIR storage volume on storage pool \"%s\" from \"%s\" to \"%s\"", s.pool.Name, s.volume.Name, newName) - - return s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, fullSnapshotName, storagePoolVolumeTypeCustom, s.poolID) -} - -func (s *storageDir) copyVolume(sourcePool string, source string, target string) error { - var srcMountPoint string - - if shared.IsSnapshot(source) { - srcMountPoint = getStoragePoolVolumeSnapshotMountPoint(sourcePool, source) - } else { - srcMountPoint = getStoragePoolVolumeMountPoint(sourcePool, source) - } - - dstMountPoint := getStoragePoolVolumeMountPoint(s.pool.Name, target) - - err := os.MkdirAll(dstMountPoint, 0711) - if err != nil { - return err - } - - err = s.initQuota(dstMountPoint, s.volumeID) - if err != nil { - return err - } - - bwlimit := s.pool.Config["rsync.bwlimit"] - - _, err = rsyncLocalCopy(srcMountPoint, dstMountPoint, bwlimit) - if err != nil { - os.RemoveAll(dstMountPoint) - logger.Errorf("Failed to rsync into DIR storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - return nil -} - -func (s *storageDir) copyVolumeSnapshot(sourcePool string, source string, target string) error { - srcMountPoint := getStoragePoolVolumeSnapshotMountPoint(sourcePool, source) - dstMountPoint := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target) - - err := os.MkdirAll(dstMountPoint, 0711) - if err != nil { - return err - } - - bwlimit := s.pool.Config["rsync.bwlimit"] - - _, err = rsyncLocalCopy(srcMountPoint, dstMountPoint, bwlimit) - if err != nil { - os.RemoveAll(dstMountPoint) - logger.Errorf("Failed to rsync into DIR storage volume \"%s\" on storage pool \"%s\": %s", target, s.pool.Name, err) - return err - } - - return nil -} From c6f2b30886631664c2bd187816ad7bfced89c404 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Thu, 2 May 2019 16:52:30 +0200 Subject: [PATCH 15/15] lxd: Remove old zfs storage code Signed-off-by: Thomas Hipp --- lxd/storage_zfs.go | 3442 -------------------------------------- lxd/storage_zfs_utils.go | 833 --------- 2 files changed, 4275 deletions(-) delete mode 100644 lxd/storage_zfs.go delete mode 100644 lxd/storage_zfs_utils.go diff --git a/lxd/storage_zfs.go b/lxd/storage_zfs.go deleted file mode 100644 index 93c60f13d0..0000000000 --- a/lxd/storage_zfs.go +++ /dev/null @@ -1,3442 +0,0 @@ -package main - -import ( - "fmt" - "io" - "io/ioutil" - "os" - "os/exec" - "path/filepath" - "strconv" - "strings" - "syscall" - - "github.com/gorilla/websocket" - "github.com/pkg/errors" - - "github.com/lxc/lxd/lxd/migration" - "github.com/lxc/lxd/lxd/util" - "github.com/lxc/lxd/shared" - "github.com/lxc/lxd/shared/api" - "github.com/lxc/lxd/shared/ioprogress" - "github.com/lxc/lxd/shared/logger" - - "github.com/pborman/uuid" -) - -// Global defaults -var zfsUseRefquota = "false" -var zfsRemoveSnapshots = "false" - -// Cache -var zfsVersion = "" - -type storageZfs struct { - dataset string - storageShared -} - -func (s *storageZfs) getOnDiskPoolName() string { - if s.dataset != "" { - return s.dataset - } - - return s.pool.Name -} - -// Only initialize the minimal information we need about a given storage type. -func (s *storageZfs) StorageCoreInit() error { - s.sType = storageTypeZfs - typeName, err := storageTypeToString(s.sType) - if err != nil { - return err - } - s.sTypeName = typeName - - if zfsVersion != "" { - s.sTypeVersion = zfsVersion - return nil - } - - util.LoadModule("zfs") - - if !zfsIsEnabled() { - return fmt.Errorf("The \"zfs\" tool is not enabled") - } - - s.sTypeVersion, err = zfsToolVersionGet() - if err != nil { - s.sTypeVersion, err = zfsModuleVersionGet() - if err != nil { - return err - } - } - - zfsVersion = s.sTypeVersion - - return nil -} - -// Functions dealing with storage pools. -func (s *storageZfs) StoragePoolInit() error { - err := s.StorageCoreInit() - if err != nil { - return err - } - - // Detect whether we have been given a zfs dataset as source. - if s.pool.Config["zfs.pool_name"] != "" { - s.dataset = s.pool.Config["zfs.pool_name"] - } - - return nil -} - -func (s *storageZfs) StoragePoolCheck() error { - logger.Debugf("Checking ZFS storage pool \"%s\"", s.pool.Name) - - source := s.pool.Config["source"] - if source == "" { - return fmt.Errorf("no \"source\" property found for the storage pool") - } - - poolName := s.getOnDiskPoolName() - purePoolName := strings.Split(poolName, "/")[0] - exists := zfsFilesystemEntityExists(purePoolName, "") - if exists { - return nil - } - - logger.Debugf("ZFS storage pool \"%s\" does not exist, trying to import it", poolName) - - var err error - var msg string - if filepath.IsAbs(source) { - disksPath := shared.VarPath("disks") - msg, err = shared.RunCommand("zpool", "import", "-d", disksPath, poolName) - } else { - msg, err = shared.RunCommand("zpool", "import", purePoolName) - } - - if err != nil { - return fmt.Errorf("ZFS storage pool \"%s\" could not be imported: %s", poolName, msg) - } - - logger.Debugf("ZFS storage pool \"%s\" successfully imported", poolName) - return nil -} - -func (s *storageZfs) StoragePoolCreate() error { - logger.Infof("Creating ZFS storage pool \"%s\"", s.pool.Name) - - err := s.zfsPoolCreate() - if err != nil { - return err - } - revert := true - defer func() { - if !revert { - return - } - s.StoragePoolDelete() - }() - - storagePoolMntPoint := getStoragePoolMountPoint(s.pool.Name) - err = os.MkdirAll(storagePoolMntPoint, 0711) - if err != nil { - return err - } - - err = s.StoragePoolCheck() - if err != nil { - return err - } - - revert = false - - logger.Infof("Created ZFS storage pool \"%s\"", s.pool.Name) - return nil -} - -func (s *storageZfs) zfsPoolCreate() error { - s.pool.Config["volatile.initial_source"] = s.pool.Config["source"] - - zpoolName := s.getOnDiskPoolName() - vdev := s.pool.Config["source"] - defaultVdev := filepath.Join(shared.VarPath("disks"), fmt.Sprintf("%s.img", s.pool.Name)) - if vdev == "" || vdev == defaultVdev { - vdev = defaultVdev - s.pool.Config["source"] = vdev - - if s.pool.Config["zfs.pool_name"] == "" { - s.pool.Config["zfs.pool_name"] = zpoolName - } - - f, err := os.Create(vdev) - if err != nil { - return fmt.Errorf("Failed to open %s: %s", vdev, err) - } - defer f.Close() - - err = f.Chmod(0600) - if err != nil { - return fmt.Errorf("Failed to chmod %s: %s", vdev, err) - } - - size, err := shared.ParseByteSizeString(s.pool.Config["size"]) - if err != nil { - return err - } - err = f.Truncate(size) - if err != nil { - return fmt.Errorf("Failed to create sparse file %s: %s", vdev, err) - } - - err = zfsPoolCreate(zpoolName, vdev) - if err != nil { - return err - } - } else { - // Unset size property since it doesn't make sense. - s.pool.Config["size"] = "" - - if filepath.IsAbs(vdev) { - if !shared.IsBlockdevPath(vdev) { - return fmt.Errorf("Custom loop file locations are not supported") - } - - if s.pool.Config["zfs.pool_name"] == "" { - s.pool.Config["zfs.pool_name"] = zpoolName - } - - // This is a block device. Note, that we do not store the - // block device path or UUID or PARTUUID or similar in - // the database. All of those might change or might be - // used in a special way (For example, zfs uses a single - // UUID in a multi-device pool for all devices.). The - // safest way is to just store the name of the zfs pool - // we create. - s.pool.Config["source"] = zpoolName - err := zfsPoolCreate(zpoolName, vdev) - if err != nil { - return err - } - } else { - if s.pool.Config["zfs.pool_name"] != "" && s.pool.Config["zfs.pool_name"] != vdev { - return fmt.Errorf("Invalid combination of \"source\" and \"zfs.pool_name\" property") - } - - s.pool.Config["zfs.pool_name"] = vdev - s.dataset = vdev - - if strings.Contains(vdev, "/") { - if !zfsFilesystemEntityExists(vdev, "") { - err := zfsPoolCreate("", vdev) - if err != nil { - return err - } - } - } else { - err := zfsPoolCheck(vdev) - if err != nil { - return err - } - } - - subvols, err := zfsPoolListSubvolumes(zpoolName, vdev) - if err != nil { - return err - } - - if len(subvols) > 0 { - return fmt.Errorf("Provided ZFS pool (or dataset) isn't empty") - } - - err = zfsPoolApplyDefaults(vdev) - if err != nil { - return err - } - } - } - - // Create default dummy datasets to avoid zfs races during container - // creation. - poolName := s.getOnDiskPoolName() - dataset := fmt.Sprintf("%s/containers", poolName) - msg, err := zfsPoolVolumeCreate(dataset, "mountpoint=none") - if err != nil { - logger.Errorf("Failed to create containers dataset: %s", msg) - return err - } - - fixperms := shared.VarPath("storage-pools", s.pool.Name, "containers") - err = os.MkdirAll(fixperms, containersDirMode) - if err != nil && !os.IsNotExist(err) { - return err - } - - err = os.Chmod(fixperms, containersDirMode) - if err != nil { - logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(containersDirMode), 8), err) - } - - dataset = fmt.Sprintf("%s/images", poolName) - msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") - if err != nil { - logger.Errorf("Failed to create images dataset: %s", msg) - return err - } - - fixperms = shared.VarPath("storage-pools", s.pool.Name, "images") - err = os.MkdirAll(fixperms, imagesDirMode) - if err != nil && !os.IsNotExist(err) { - return err - } - err = os.Chmod(fixperms, imagesDirMode) - if err != nil { - logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(imagesDirMode), 8), err) - } - - dataset = fmt.Sprintf("%s/custom", poolName) - msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") - if err != nil { - logger.Errorf("Failed to create custom dataset: %s", msg) - return err - } - - fixperms = shared.VarPath("storage-pools", s.pool.Name, "custom") - err = os.MkdirAll(fixperms, customDirMode) - if err != nil && !os.IsNotExist(err) { - return err - } - err = os.Chmod(fixperms, customDirMode) - if err != nil { - logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(customDirMode), 8), err) - } - - dataset = fmt.Sprintf("%s/deleted", poolName) - msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") - if err != nil { - logger.Errorf("Failed to create deleted dataset: %s", msg) - return err - } - - dataset = fmt.Sprintf("%s/snapshots", poolName) - msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") - if err != nil { - logger.Errorf("Failed to create snapshots dataset: %s", msg) - return err - } - - fixperms = shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots") - err = os.MkdirAll(fixperms, snapshotsDirMode) - if err != nil && !os.IsNotExist(err) { - return err - } - err = os.Chmod(fixperms, snapshotsDirMode) - if err != nil { - logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(snapshotsDirMode), 8), err) - } - - dataset = fmt.Sprintf("%s/custom-snapshots", poolName) - msg, err = zfsPoolVolumeCreate(dataset, "mountpoint=none") - if err != nil { - logger.Errorf("Failed to create snapshots dataset: %s", msg) - return err - } - - fixperms = shared.VarPath("storage-pools", s.pool.Name, "custom-snapshots") - err = os.MkdirAll(fixperms, snapshotsDirMode) - if err != nil && !os.IsNotExist(err) { - return err - } - err = os.Chmod(fixperms, snapshotsDirMode) - if err != nil { - logger.Warnf("Failed to chmod \"%s\" to \"0%s\": %s", fixperms, strconv.FormatInt(int64(snapshotsDirMode), 8), err) - } - - return nil -} - -func (s *storageZfs) StoragePoolDelete() error { - logger.Infof("Deleting ZFS storage pool \"%s\"", s.pool.Name) - - poolName := s.getOnDiskPoolName() - if zfsFilesystemEntityExists(poolName, "") { - err := zfsFilesystemEntityDelete(s.pool.Config["source"], poolName) - if err != nil { - return err - } - } - - storagePoolMntPoint := getStoragePoolMountPoint(s.pool.Name) - if shared.PathExists(storagePoolMntPoint) { - err := os.RemoveAll(storagePoolMntPoint) - if err != nil { - return err - } - } - - logger.Infof("Deleted ZFS storage pool \"%s\"", s.pool.Name) - return nil -} - -func (s *storageZfs) StoragePoolMount() (bool, error) { - return true, nil -} - -func (s *storageZfs) StoragePoolUmount() (bool, error) { - return true, nil -} - -func (s *storageZfs) StoragePoolVolumeCreate() error { - logger.Infof("Creating ZFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - isSnapshot := shared.IsSnapshot(s.volume.Name) - - var fs string - - if isSnapshot { - fs = fmt.Sprintf("custom-snapshots/%s", s.volume.Name) - } else { - fs = fmt.Sprintf("custom/%s", s.volume.Name) - } - poolName := s.getOnDiskPoolName() - dataset := fmt.Sprintf("%s/%s", poolName, fs) - - var customPoolVolumeMntPoint string - - if isSnapshot { - customPoolVolumeMntPoint = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - } else { - customPoolVolumeMntPoint = getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - } - - msg, err := zfsPoolVolumeCreate(dataset, "mountpoint=none", "canmount=noauto") - if err != nil { - logger.Errorf("Failed to create ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, msg) - return err - } - revert := true - defer func() { - if !revert { - return - } - s.StoragePoolVolumeDelete() - }() - - err = zfsPoolVolumeSet(poolName, fs, "mountpoint", customPoolVolumeMntPoint) - if err != nil { - return err - } - - if !shared.IsMountPoint(customPoolVolumeMntPoint) { - err := zfsMount(poolName, fs) - if err != nil { - return err - } - defer zfsUmount(poolName, fs, customPoolVolumeMntPoint) - } - - // apply quota - if s.volume.Config["size"] != "" { - size, err := shared.ParseByteSizeString(s.volume.Config["size"]) - if err != nil { - return err - } - - err = s.StorageEntitySetQuota(storagePoolVolumeTypeCustom, size, nil) - if err != nil { - return err - } - } - - revert = false - - logger.Infof("Created ZFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageZfs) StoragePoolVolumeDelete() error { - logger.Infof("Deleting ZFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - fs := fmt.Sprintf("custom/%s", s.volume.Name) - customPoolVolumeMntPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - - poolName := s.getOnDiskPoolName() - if zfsFilesystemEntityExists(poolName, fs) { - removable := true - snaps, err := zfsPoolListSnapshots(poolName, fs) - if err != nil { - return err - } - - for _, snap := range snaps { - var err error - removable, err = zfsPoolVolumeSnapshotRemovable(poolName, fs, snap) - if err != nil { - return err - } - - if !removable { - break - } - } - - if removable { - origin, err := zfsFilesystemEntityPropertyGet(poolName, fs, "origin") - if err != nil { - return err - } - poolName := s.getOnDiskPoolName() - origin = strings.TrimPrefix(origin, fmt.Sprintf("%s/", poolName)) - - err = zfsPoolVolumeDestroy(poolName, fs) - if err != nil { - return err - } - - err = zfsPoolVolumeCleanup(poolName, origin) - if err != nil { - return err - } - } else { - err := zfsPoolVolumeSet(poolName, fs, "mountpoint", "none") - if err != nil { - return err - } - - err = zfsPoolVolumeRename(poolName, fs, fmt.Sprintf("deleted/custom/%s", uuid.NewRandom().String()), true) - if err != nil { - return err - } - } - } - - if shared.PathExists(customPoolVolumeMntPoint) { - err := os.RemoveAll(customPoolVolumeMntPoint) - if err != nil { - return err - } - } - - err := s.s.Cluster.StoragePoolVolumeDelete( - "default", - s.volume.Name, - storagePoolVolumeTypeCustom, - s.poolID) - if err != nil { - logger.Errorf(`Failed to delete database entry for ZFS storage volume "%s" on storage pool "%s"`, s.volume.Name, s.pool.Name) - } - - logger.Infof("Deleted ZFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageZfs) StoragePoolVolumeMount() (bool, error) { - logger.Debugf("Mounting ZFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - fs := fmt.Sprintf("custom/%s", s.volume.Name) - customPoolVolumeMntPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - - customMountLockID := getCustomMountLockID(s.pool.Name, s.volume.Name) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[customMountLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - // Give the benefit of the doubt and assume that the other - // thread actually succeeded in mounting the storage volume. - return false, nil - } - - lxdStorageOngoingOperationMap[customMountLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - var customerr error - ourMount := false - if !shared.IsMountPoint(customPoolVolumeMntPoint) { - customerr = zfsMount(s.getOnDiskPoolName(), fs) - ourMount = true - } - - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[customMountLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, customMountLockID) - } - lxdStorageMapLock.Unlock() - - if customerr != nil { - return false, customerr - } - - logger.Debugf("Mounted ZFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return ourMount, nil -} - -func (s *storageZfs) StoragePoolVolumeUmount() (bool, error) { - logger.Debugf("Unmounting ZFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - fs := fmt.Sprintf("custom/%s", s.volume.Name) - customPoolVolumeMntPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - - customUmountLockID := getCustomUmountLockID(s.pool.Name, s.volume.Name) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[customUmountLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - // Give the benefit of the doubt and assume that the other - // thread actually succeeded in unmounting the storage volume. - return false, nil - } - - lxdStorageOngoingOperationMap[customUmountLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - var customerr error - ourUmount := false - if shared.IsMountPoint(customPoolVolumeMntPoint) { - customerr = zfsUmount(s.getOnDiskPoolName(), fs, customPoolVolumeMntPoint) - ourUmount = true - } - - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[customUmountLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, customUmountLockID) - } - lxdStorageMapLock.Unlock() - - if customerr != nil { - return false, customerr - } - - logger.Debugf("Unmounted ZFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return ourUmount, nil -} - -func (s *storageZfs) GetContainerPoolInfo() (int64, string, string) { - return s.poolID, s.pool.Name, s.getOnDiskPoolName() -} - -func (s *storageZfs) StoragePoolUpdate(writable *api.StoragePoolPut, changedConfig []string) error { - logger.Infof(`Updating ZFS storage pool "%s"`, s.pool.Name) - - changeable := changeableStoragePoolProperties["zfs"] - unchangeable := []string{} - for _, change := range changedConfig { - if !shared.StringInSlice(change, changeable) { - unchangeable = append(unchangeable, change) - } - } - - if len(unchangeable) > 0 { - return updateStoragePoolError(unchangeable, "zfs") - } - - // "rsync.bwlimit" requires no on-disk modifications. - // "volume.zfs.remove_snapshots" requires no on-disk modifications. - // "volume.zfs.use_refquota" requires no on-disk modifications. - - logger.Infof(`Updated ZFS storage pool "%s"`, s.pool.Name) - return nil -} - -func (s *storageZfs) StoragePoolVolumeUpdate(writable *api.StorageVolumePut, changedConfig []string) error { - if writable.Restore != "" { - logger.Infof(`Restoring ZFS storage volume "%s" from snapshot "%s"`, - s.volume.Name, writable.Restore) - - // Check that we can remove the snapshot - poolID, err := s.s.Cluster.StoragePoolGetID(s.pool.Name) - if err != nil { - return err - } - - // Get the names of all storage volume snapshots of a given volume - volumes, err := s.s.Cluster.StoragePoolVolumeSnapshotsGetType(s.volume.Name, storagePoolVolumeTypeCustom, poolID) - if err != nil { - return err - } - - if volumes[len(volumes)-1] != fmt.Sprintf("%s/%s", s.volume.Name, writable.Restore) { - return fmt.Errorf("ZFS can only restore from the latest snapshot. Delete newer snapshots or copy the snapshot into a new volume instead") - } - - s.volume.Description = writable.Description - s.volume.Config = writable.Config - - targetSnapshotDataset := fmt.Sprintf("%s/custom/%s at snapshot-%s", s.getOnDiskPoolName(), s.volume.Name, writable.Restore) - msg, err := shared.RunCommand("zfs", "rollback", "-r", "-R", targetSnapshotDataset) - if err != nil { - logger.Errorf("Failed to rollback ZFS dataset: %s", msg) - return err - } - - logger.Infof(`Restored ZFS storage volume "%s" from snapshot "%s"`, - s.volume.Name, writable.Restore) - return nil - } - - logger.Infof(`Updating ZFS storage volume "%s"`, s.volume.Name) - - changeable := changeableStoragePoolVolumeProperties["zfs"] - unchangeable := []string{} - for _, change := range changedConfig { - if !shared.StringInSlice(change, changeable) { - unchangeable = append(unchangeable, change) - } - } - - if len(unchangeable) > 0 { - return updateStoragePoolVolumeError(unchangeable, "zfs") - } - - if shared.StringInSlice("size", changedConfig) { - if s.volume.Type != storagePoolVolumeTypeNameCustom { - return updateStoragePoolVolumeError([]string{"size"}, "zfs") - } - - if s.volume.Config["size"] != writable.Config["size"] { - size, err := shared.ParseByteSizeString(writable.Config["size"]) - if err != nil { - return err - } - - err = s.StorageEntitySetQuota(storagePoolVolumeTypeCustom, size, nil) - if err != nil { - return err - } - } - } - - logger.Infof(`Updated ZFS storage volume "%s"`, s.volume.Name) - return nil -} - -func (s *storageZfs) StoragePoolVolumeRename(newName string) error { - logger.Infof(`Renaming ZFS storage volume on storage pool "%s" from "%s" to "%s`, - s.pool.Name, s.volume.Name, newName) - - usedBy, err := storagePoolVolumeUsedByContainersGet(s.s, "default", s.volume.Name, storagePoolVolumeTypeNameCustom) - if err != nil { - return err - } - if len(usedBy) > 0 { - return fmt.Errorf(`ZFS storage volume "%s" on storage pool "%s" is attached to containers`, - s.volume.Name, s.pool.Name) - } - - isSnapshot := shared.IsSnapshot(s.volume.Name) - - var oldPath string - var newPath string - - if isSnapshot { - oldPath = fmt.Sprintf("custom-snapshots/%s", s.volume.Name) - newPath = fmt.Sprintf("custom-snapshots/%s", newName) - } else { - oldPath = fmt.Sprintf("custom/%s", s.volume.Name) - newPath = fmt.Sprintf("custom/%s", newName) - } - poolName := s.getOnDiskPoolName() - err = zfsPoolVolumeRename(poolName, oldPath, newPath, false) - if err != nil { - return err - } - - logger.Infof(`Renamed ZFS storage volume on storage pool "%s" from "%s" to "%s`, - s.pool.Name, s.volume.Name, newName) - - return s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, newName, - storagePoolVolumeTypeCustom, s.poolID) -} - -// Things we don't need to care about -func (s *storageZfs) ContainerMount(c container) (bool, error) { - return s.doContainerMount(c.Project(), c.Name(), c.IsPrivileged()) -} - -func (s *storageZfs) ContainerUmount(c container, path string) (bool, error) { - logger.Debugf("Unmounting ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - name := c.Name() - - fs := fmt.Sprintf("containers/%s", projectPrefix(c.Project(), name)) - containerPoolVolumeMntPoint := getContainerMountPoint(c.Project(), s.pool.Name, name) - - containerUmountLockID := getContainerUmountLockID(s.pool.Name, name) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[containerUmountLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - // Give the benefit of the doubt and assume that the other - // thread actually succeeded in unmounting the storage volume. - return false, nil - } - - lxdStorageOngoingOperationMap[containerUmountLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - var imgerr error - ourUmount := false - if shared.IsMountPoint(containerPoolVolumeMntPoint) { - imgerr = zfsUmount(s.getOnDiskPoolName(), fs, containerPoolVolumeMntPoint) - ourUmount = true - } - - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[containerUmountLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, containerUmountLockID) - } - lxdStorageMapLock.Unlock() - - if imgerr != nil { - return false, imgerr - } - - logger.Debugf("Unmounted ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return ourUmount, nil -} - -// Things we do have to care about -func (s *storageZfs) ContainerStorageReady(container container) bool { - volumeName := projectPrefix(container.Project(), container.Name()) - fs := fmt.Sprintf("containers/%s", volumeName) - return zfsFilesystemEntityExists(s.getOnDiskPoolName(), fs) -} - -func (s *storageZfs) ContainerCreate(container container) error { - err := s.doContainerCreate(container.Project(), container.Name(), container.IsPrivileged()) - if err != nil { - s.doContainerDelete(container.Project(), container.Name()) - return err - } - - ourMount, err := s.ContainerMount(container) - if err != nil { - return err - } - if ourMount { - defer s.ContainerUmount(container, container.Path()) - } - - err = container.TemplateApply("create") - if err != nil { - return err - } - - return nil -} - -func (s *storageZfs) ContainerCreateFromImage(container container, fingerprint string, tracker *ioprogress.ProgressTracker) error { - logger.Debugf("Creating ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - containerPath := container.Path() - containerName := container.Name() - volumeName := projectPrefix(container.Project(), containerName) - fs := fmt.Sprintf("containers/%s", volumeName) - containerPoolVolumeMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, containerName) - - poolName := s.getOnDiskPoolName() - fsImage := fmt.Sprintf("images/%s", fingerprint) - - imageStoragePoolLockID := getImageCreateLockID(s.pool.Name, fingerprint) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[imageStoragePoolLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - } else { - lxdStorageOngoingOperationMap[imageStoragePoolLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - var imgerr error - if !zfsFilesystemEntityExists(poolName, fsImage) { - imgerr = s.ImageCreate(fingerprint, tracker) - } - - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[imageStoragePoolLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, imageStoragePoolLockID) - } - lxdStorageMapLock.Unlock() - - if imgerr != nil { - return imgerr - } - } - - err := zfsPoolVolumeClone(container.Project(), poolName, fsImage, "readonly", fs, containerPoolVolumeMntPoint) - if err != nil { - return err - } - - revert := true - defer func() { - if !revert { - return - } - s.ContainerDelete(container) - }() - - ourMount, err := s.ContainerMount(container) - if err != nil { - return err - } - if ourMount { - defer s.ContainerUmount(container, containerPath) - } - - privileged := container.IsPrivileged() - err = createContainerMountpoint(containerPoolVolumeMntPoint, containerPath, privileged) - if err != nil { - return err - } - - err = container.TemplateApply("create") - if err != nil { - return err - } - - revert = false - - logger.Debugf("Created ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageZfs) ContainerDelete(container container) error { - err := s.doContainerDelete(container.Project(), container.Name()) - if err != nil { - return err - } - - return nil -} - -func (s *storageZfs) copyWithoutSnapshotsSparse(target container, source container) error { - poolName := s.getOnDiskPoolName() - - sourceContainerName := source.Name() - sourceContainerPath := source.Path() - - targetContainerName := target.Name() - targetContainerPath := target.Path() - targetContainerMountPoint := getContainerMountPoint(target.Project(), s.pool.Name, targetContainerName) - - sourceZfsDataset := "" - sourceZfsDatasetSnapshot := "" - sourceName, sourceSnapOnlyName, isSnapshotName := containerGetParentAndSnapshotName(sourceContainerName) - - targetZfsDataset := fmt.Sprintf("containers/%s", projectPrefix(target.Project(), targetContainerName)) - - if isSnapshotName { - sourceZfsDatasetSnapshot = sourceSnapOnlyName - } - - revert := true - if sourceZfsDatasetSnapshot == "" { - if zfsFilesystemEntityExists(poolName, fmt.Sprintf("containers/%s", projectPrefix(source.Project(), sourceName))) { - sourceZfsDatasetSnapshot = fmt.Sprintf("copy-%s", uuid.NewRandom().String()) - sourceZfsDataset = fmt.Sprintf("containers/%s", projectPrefix(source.Project(), sourceName)) - err := zfsPoolVolumeSnapshotCreate(poolName, sourceZfsDataset, sourceZfsDatasetSnapshot) - if err != nil { - return err - } - defer func() { - if !revert { - return - } - zfsPoolVolumeSnapshotDestroy(poolName, sourceZfsDataset, sourceZfsDatasetSnapshot) - }() - } - } else { - if zfsFilesystemEntityExists(poolName, fmt.Sprintf("containers/%s at snapshot-%s", projectPrefix(source.Project(), sourceName), sourceZfsDatasetSnapshot)) { - sourceZfsDataset = fmt.Sprintf("containers/%s", projectPrefix(source.Project(), sourceName)) - sourceZfsDatasetSnapshot = fmt.Sprintf("snapshot-%s", sourceZfsDatasetSnapshot) - } - } - - if sourceZfsDataset != "" { - err := zfsPoolVolumeClone(target.Project(), poolName, sourceZfsDataset, sourceZfsDatasetSnapshot, targetZfsDataset, targetContainerMountPoint) - if err != nil { - return err - } - defer func() { - if !revert { - return - } - zfsPoolVolumeDestroy(poolName, targetZfsDataset) - }() - - ourMount, err := s.ContainerMount(target) - if err != nil { - return err - } - if ourMount { - defer s.ContainerUmount(target, targetContainerPath) - } - - err = createContainerMountpoint(targetContainerMountPoint, targetContainerPath, target.IsPrivileged()) - if err != nil { - return err - } - defer func() { - if !revert { - return - } - deleteContainerMountpoint(targetContainerMountPoint, targetContainerPath, s.GetStorageTypeName()) - }() - } else { - err := s.ContainerCreate(target) - if err != nil { - return err - } - defer func() { - if !revert { - return - } - s.ContainerDelete(target) - }() - - bwlimit := s.pool.Config["rsync.bwlimit"] - output, err := rsyncLocalCopy(sourceContainerPath, targetContainerPath, bwlimit) - if err != nil { - return fmt.Errorf("rsync failed: %s", string(output)) - } - } - - err := target.TemplateApply("copy") - if err != nil { - return err - } - - revert = false - - return nil -} - -func (s *storageZfs) copyWithoutSnapshotFull(target container, source container) error { - logger.Debugf("Creating full ZFS copy \"%s\" to \"%s\"", source.Name(), target.Name()) - - sourceIsSnapshot := source.IsSnapshot() - poolName := s.getOnDiskPoolName() - - sourceName := source.Name() - sourceDataset := "" - snapshotSuffix := "" - - targetName := target.Name() - targetDataset := fmt.Sprintf("%s/containers/%s", poolName, projectPrefix(target.Project(), targetName)) - targetSnapshotDataset := "" - - if sourceIsSnapshot { - sourceParentName, sourceSnapOnlyName, _ := containerGetParentAndSnapshotName(source.Name()) - snapshotSuffix = fmt.Sprintf("snapshot-%s", sourceSnapOnlyName) - sourceDataset = fmt.Sprintf("%s/containers/%s@%s", poolName, sourceParentName, snapshotSuffix) - targetSnapshotDataset = fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(target.Project(), targetName), sourceSnapOnlyName) - } else { - snapshotSuffix = uuid.NewRandom().String() - sourceDataset = fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(source.Project(), sourceName), snapshotSuffix) - targetSnapshotDataset = fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(target.Project(), targetName), snapshotSuffix) - - fs := fmt.Sprintf("containers/%s", projectPrefix(source.Project(), sourceName)) - err := zfsPoolVolumeSnapshotCreate(poolName, fs, snapshotSuffix) - if err != nil { - return err - } - defer func() { - err := zfsPoolVolumeSnapshotDestroy(poolName, fs, snapshotSuffix) - if err != nil { - logger.Warnf("Failed to delete temporary ZFS snapshot \"%s\", manual cleanup needed", sourceDataset) - } - }() - } - - zfsSendCmd := exec.Command("zfs", "send", sourceDataset) - - zfsRecvCmd := exec.Command("zfs", "receive", targetDataset) - - zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() - zfsRecvCmd.Stdout = os.Stdout - zfsRecvCmd.Stderr = os.Stderr - - err := zfsRecvCmd.Start() - if err != nil { - return err - } - - err = zfsSendCmd.Run() - if err != nil { - return err - } - - err = zfsRecvCmd.Wait() - if err != nil { - return err - } - - msg, err := shared.RunCommand("zfs", "rollback", "-r", "-R", targetSnapshotDataset) - if err != nil { - logger.Errorf("Failed to rollback ZFS dataset: %s", msg) - return err - } - - targetContainerMountPoint := getContainerMountPoint(target.Project(), s.pool.Name, targetName) - targetfs := fmt.Sprintf("containers/%s", targetName) - - err = zfsPoolVolumeSet(poolName, targetfs, "canmount", "noauto") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(poolName, targetfs, "mountpoint", targetContainerMountPoint) - if err != nil { - return err - } - - err = zfsPoolVolumeSnapshotDestroy(poolName, targetfs, snapshotSuffix) - if err != nil { - return err - } - - ourMount, err := s.ContainerMount(target) - if err != nil { - return err - } - if ourMount { - defer s.ContainerUmount(target, targetContainerMountPoint) - } - - err = createContainerMountpoint(targetContainerMountPoint, target.Path(), target.IsPrivileged()) - if err != nil { - return err - } - - logger.Debugf("Created full ZFS copy \"%s\" to \"%s\"", source.Name(), target.Name()) - return nil -} - -func (s *storageZfs) copyWithSnapshots(target container, source container, parentSnapshot string) error { - sourceName := source.Name() - targetParentName, targetSnapOnlyName, _ := containerGetParentAndSnapshotName(target.Name()) - containersPath := getSnapshotMountPoint(target.Project(), s.pool.Name, targetParentName) - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(target.Project(), targetParentName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(target.Project(), targetParentName)) - err := createSnapshotMountpoint(containersPath, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - - poolName := s.getOnDiskPoolName() - sourceParentName, sourceSnapOnlyName, _ := containerGetParentAndSnapshotName(sourceName) - currentSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(source.Project(), sourceParentName), sourceSnapOnlyName) - args := []string{"send", currentSnapshotDataset} - if parentSnapshot != "" { - parentName, parentSnaponlyName, _ := containerGetParentAndSnapshotName(parentSnapshot) - parentSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(source.Project(), parentName), parentSnaponlyName) - args = append(args, "-i", parentSnapshotDataset) - } - - zfsSendCmd := exec.Command("zfs", args...) - targetSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(target.Project(), targetParentName), targetSnapOnlyName) - zfsRecvCmd := exec.Command("zfs", "receive", "-F", targetSnapshotDataset) - - zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() - zfsRecvCmd.Stdout = os.Stdout - zfsRecvCmd.Stderr = os.Stderr - - err = zfsRecvCmd.Start() - if err != nil { - return err - } - - err = zfsSendCmd.Run() - if err != nil { - return err - } - - err = zfsRecvCmd.Wait() - if err != nil { - return err - } - - return nil -} - -func (s *storageZfs) doCrossPoolContainerCopy(target container, source container, containerOnly bool, refresh bool, refreshSnapshots []container) error { - sourcePool, err := source.StoragePool() - if err != nil { - return err - } - - // setup storage for the source volume - srcStorage, err := storagePoolVolumeInit(s.s, "default", sourcePool, source.Name(), storagePoolVolumeTypeContainer) - if err != nil { - return err - } - - ourMount, err := srcStorage.StoragePoolMount() - if err != nil { - return err - } - if ourMount { - defer srcStorage.StoragePoolUmount() - } - - targetPool, err := target.StoragePool() - if err != nil { - return err - } - - var snapshots []container - - if refresh { - snapshots = refreshSnapshots - } else { - snapshots, err = source.Snapshots() - if err != nil { - return err - } - - // create the main container - err = s.doContainerCreate(target.Project(), target.Name(), target.IsPrivileged()) - if err != nil { - return err - } - } - - _, err = s.doContainerMount(target.Project(), target.Name(), target.IsPrivileged()) - if err != nil { - return err - } - defer s.ContainerUmount(target, shared.VarPath("containers", projectPrefix(target.Project(), target.Name()))) - - destContainerMntPoint := getContainerMountPoint(target.Project(), targetPool, target.Name()) - bwlimit := s.pool.Config["rsync.bwlimit"] - if !containerOnly { - for _, snap := range snapshots { - srcSnapshotMntPoint := getSnapshotMountPoint(target.Project(), sourcePool, snap.Name()) - _, err = rsyncLocalCopy(srcSnapshotMntPoint, destContainerMntPoint, bwlimit) - if err != nil { - logger.Errorf("Failed to rsync into ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - // create snapshot - _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap.Name()) - err = s.doContainerSnapshotCreate(snap.Project(), fmt.Sprintf("%s/%s", target.Name(), snapOnlyName), target.Name()) - if err != nil { - return err - } - } - } - - srcContainerMntPoint := getContainerMountPoint(source.Project(), sourcePool, source.Name()) - _, err = rsyncLocalCopy(srcContainerMntPoint, destContainerMntPoint, bwlimit) - if err != nil { - logger.Errorf("Failed to rsync into ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - return nil -} - -func (s *storageZfs) ContainerCopy(target container, source container, containerOnly bool) error { - logger.Debugf("Copying ZFS container storage %s to %s", source.Name(), target.Name()) - - ourStart, err := source.StorageStart() - if err != nil { - return err - } - if ourStart { - defer source.StorageStop() - } - - _, sourcePool, _ := source.Storage().GetContainerPoolInfo() - _, targetPool, _ := target.Storage().GetContainerPoolInfo() - if sourcePool != targetPool { - return s.doCrossPoolContainerCopy(target, source, containerOnly, false, nil) - } - - snapshots, err := source.Snapshots() - if err != nil { - return err - } - - if containerOnly || len(snapshots) == 0 { - if s.pool.Config["zfs.clone_copy"] != "" && !shared.IsTrue(s.pool.Config["zfs.clone_copy"]) { - err = s.copyWithoutSnapshotFull(target, source) - if err != nil { - return err - } - } else { - err = s.copyWithoutSnapshotsSparse(target, source) - if err != nil { - return err - } - } - } else { - targetContainerName := target.Name() - targetContainerPath := target.Path() - targetContainerMountPoint := getContainerMountPoint(target.Project(), s.pool.Name, targetContainerName) - err = createContainerMountpoint(targetContainerMountPoint, targetContainerPath, target.IsPrivileged()) - if err != nil { - return err - } - - prev := "" - prevSnapOnlyName := "" - for i, snap := range snapshots { - if i > 0 { - prev = snapshots[i-1].Name() - } - - sourceSnapshot, err := containerLoadByProjectAndName(s.s, source.Project(), snap.Name()) - if err != nil { - return err - } - - _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap.Name()) - prevSnapOnlyName = snapOnlyName - newSnapName := fmt.Sprintf("%s/%s", target.Name(), snapOnlyName) - targetSnapshot, err := containerLoadByProjectAndName(s.s, target.Project(), newSnapName) - if err != nil { - return err - } - - err = s.copyWithSnapshots(targetSnapshot, sourceSnapshot, prev) - if err != nil { - return err - } - } - - poolName := s.getOnDiskPoolName() - - // send actual container - tmpSnapshotName := fmt.Sprintf("copy-send-%s", uuid.NewRandom().String()) - err = zfsPoolVolumeSnapshotCreate(poolName, fmt.Sprintf("containers/%s", projectPrefix(source.Project(), source.Name())), tmpSnapshotName) - if err != nil { - return err - } - - currentSnapshotDataset := fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(source.Project(), source.Name()), tmpSnapshotName) - args := []string{"send", currentSnapshotDataset} - if prevSnapOnlyName != "" { - parentSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(source.Project(), source.Name()), prevSnapOnlyName) - args = append(args, "-i", parentSnapshotDataset) - } - - zfsSendCmd := exec.Command("zfs", args...) - targetSnapshotDataset := fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(target.Project(), target.Name()), tmpSnapshotName) - zfsRecvCmd := exec.Command("zfs", "receive", "-F", targetSnapshotDataset) - - zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() - zfsRecvCmd.Stdout = os.Stdout - zfsRecvCmd.Stderr = os.Stderr - - err = zfsRecvCmd.Start() - if err != nil { - return err - } - - err = zfsSendCmd.Run() - if err != nil { - return err - } - - err = zfsRecvCmd.Wait() - if err != nil { - return err - } - - zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(source.Project(), source.Name())), tmpSnapshotName) - zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(target.Project(), target.Name())), tmpSnapshotName) - - fs := fmt.Sprintf("containers/%s", projectPrefix(target.Project(), target.Name())) - err = zfsPoolVolumeSet(poolName, fs, "canmount", "noauto") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(poolName, fs, "mountpoint", targetContainerMountPoint) - if err != nil { - return err - } - } - - logger.Debugf("Copied ZFS container storage %s to %s", source.Name(), target.Name()) - return nil -} - -func (s *storageZfs) ContainerRefresh(target container, source container, snapshots []container) error { - logger.Debugf("Refreshing ZFS container storage for %s from %s", target.Name(), source.Name()) - - ourStart, err := source.StorageStart() - if err != nil { - return err - } - if ourStart { - defer source.StorageStop() - } - - return s.doCrossPoolContainerCopy(target, source, len(snapshots) == 0, true, snapshots) -} - -func (s *storageZfs) ContainerRename(container container, newName string) error { - logger.Debugf("Renaming ZFS storage volume for container \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - - poolName := s.getOnDiskPoolName() - oldName := container.Name() - - // Unmount the dataset. - _, err := s.ContainerUmount(container, "") - if err != nil { - return err - } - - // Rename the dataset. - oldZfsDataset := fmt.Sprintf("containers/%s", oldName) - newZfsDataset := fmt.Sprintf("containers/%s", newName) - err = zfsPoolVolumeRename(poolName, oldZfsDataset, newZfsDataset, false) - if err != nil { - return err - } - revert := true - defer func() { - if !revert { - return - } - s.ContainerRename(container, oldName) - }() - - // Set the new mountpoint for the dataset. - newContainerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, newName) - err = zfsPoolVolumeSet(poolName, newZfsDataset, "mountpoint", newContainerMntPoint) - if err != nil { - return err - } - - // Unmount the dataset. - container.(*containerLXC).name = newName - _, err = s.ContainerUmount(container, "") - if err != nil { - return err - } - - // Create new mountpoint on the storage pool. - oldContainerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, oldName) - oldContainerMntPointSymlink := container.Path() - newContainerMntPointSymlink := shared.VarPath("containers", projectPrefix(container.Project(), newName)) - err = renameContainerMountpoint(oldContainerMntPoint, oldContainerMntPointSymlink, newContainerMntPoint, newContainerMntPointSymlink) - if err != nil { - return err - } - - // Rename the snapshot mountpoint on the storage pool. - oldSnapshotMntPoint := getSnapshotMountPoint(container.Project(), s.pool.Name, oldName) - newSnapshotMntPoint := getSnapshotMountPoint(container.Project(), s.pool.Name, newName) - if shared.PathExists(oldSnapshotMntPoint) { - err := os.Rename(oldSnapshotMntPoint, newSnapshotMntPoint) - if err != nil { - return err - } - } - - // Remove old symlink. - oldSnapshotPath := shared.VarPath("snapshots", projectPrefix(container.Project(), oldName)) - if shared.PathExists(oldSnapshotPath) { - err := os.Remove(oldSnapshotPath) - if err != nil { - return err - } - } - - // Create new symlink. - newSnapshotPath := shared.VarPath("snapshots", projectPrefix(container.Project(), newName)) - if shared.PathExists(newSnapshotPath) { - err := os.Symlink(newSnapshotMntPoint, newSnapshotPath) - if err != nil { - return err - } - } - - revert = false - - logger.Debugf("Renamed ZFS storage volume for container \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - return nil -} - -func (s *storageZfs) ContainerRestore(target container, source container) error { - logger.Debugf("Restoring ZFS storage volume for container \"%s\" from %s to %s", s.volume.Name, source.Name(), target.Name()) - - snaps, err := target.Snapshots() - if err != nil { - return err - } - - if snaps[len(snaps)-1].Name() != source.Name() { - if s.pool.Config["volume.zfs.remove_snapshots"] != "" { - zfsRemoveSnapshots = s.pool.Config["volume.zfs.remove_snapshots"] - } - - if s.volume.Config["zfs.remove_snapshots"] != "" { - zfsRemoveSnapshots = s.volume.Config["zfs.remove_snapshots"] - } - - if !shared.IsTrue(zfsRemoveSnapshots) { - return fmt.Errorf("ZFS can only restore from the latest snapshot. Delete newer snapshots or copy the snapshot into a new container instead") - } - } - - // Start storage for source container - ourSourceStart, err := source.StorageStart() - if err != nil { - return err - } - if ourSourceStart { - defer source.StorageStop() - } - - // Start storage for target container - ourTargetStart, err := target.StorageStart() - if err != nil { - return err - } - if ourTargetStart { - defer target.StorageStop() - } - - for i := len(snaps) - 1; i != 0; i-- { - if snaps[i].Name() == source.Name() { - break - } - - err := snaps[i].Delete() - if err != nil { - return err - } - } - - // Restore the snapshot - cName, snapOnlyName, _ := containerGetParentAndSnapshotName(source.Name()) - snapName := fmt.Sprintf("snapshot-%s", snapOnlyName) - - err = zfsPoolVolumeSnapshotRestore(s.getOnDiskPoolName(), fmt.Sprintf("containers/%s", cName), snapName) - if err != nil { - return err - } - - logger.Debugf("Restored ZFS storage volume for container \"%s\" from %s to %s", s.volume.Name, source.Name(), target.Name()) - return nil -} - -func (s *storageZfs) ContainerGetUsage(container container) (int64, error) { - var err error - - fs := fmt.Sprintf("containers/%s", container.Name()) - - property := "used" - - if s.pool.Config["volume.zfs.use_refquota"] != "" { - zfsUseRefquota = s.pool.Config["volume.zfs.use_refquota"] - } - if s.volume.Config["zfs.use_refquota"] != "" { - zfsUseRefquota = s.volume.Config["zfs.use_refquota"] - } - - if shared.IsTrue(zfsUseRefquota) { - property = "referenced" - } - - // Shortcut for refquota - mountpoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) - if property == "referenced" && shared.IsMountPoint(mountpoint) { - var stat syscall.Statfs_t - err := syscall.Statfs(mountpoint, &stat) - if err != nil { - return -1, err - } - - return int64(stat.Blocks-stat.Bfree) * int64(stat.Bsize), nil - } - - value, err := zfsFilesystemEntityPropertyGet(s.getOnDiskPoolName(), fs, property) - if err != nil { - return -1, err - } - - valueInt, err := strconv.ParseInt(value, 10, 64) - if err != nil { - return -1, err - } - - return valueInt, nil -} - -func (s *storageZfs) doContainerSnapshotCreate(project, targetName string, sourceName string) error { - snapshotContainerName := targetName - logger.Debugf("Creating ZFS storage volume for snapshot \"%s\" on storage pool \"%s\"", snapshotContainerName, s.pool.Name) - - sourceContainerName := sourceName - - cName, snapshotSnapOnlyName, _ := containerGetParentAndSnapshotName(snapshotContainerName) - snapName := fmt.Sprintf("snapshot-%s", snapshotSnapOnlyName) - - sourceZfsDataset := fmt.Sprintf("containers/%s", projectPrefix(project, cName)) - err := zfsPoolVolumeSnapshotCreate(s.getOnDiskPoolName(), sourceZfsDataset, snapName) - if err != nil { - return err - } - - snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, snapshotContainerName) - if !shared.PathExists(snapshotMntPoint) { - err := os.MkdirAll(snapshotMntPoint, 0700) - if err != nil { - return err - } - } - - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(project, sourceName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(project, sourceContainerName)) - if !shared.PathExists(snapshotMntPointSymlink) { - err := os.Symlink(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - } - - logger.Debugf("Created ZFS storage volume for snapshot \"%s\" on storage pool \"%s\"", snapshotContainerName, s.pool.Name) - return nil -} - -func (s *storageZfs) ContainerSnapshotCreate(snapshotContainer container, sourceContainer container) error { - err := s.doContainerSnapshotCreate(sourceContainer.Project(), snapshotContainer.Name(), sourceContainer.Name()) - if err != nil { - s.ContainerSnapshotDelete(snapshotContainer) - return err - } - return nil -} - -func zfsSnapshotDeleteInternal(project, poolName string, ctName string, onDiskPoolName string) error { - sourceContainerName, sourceContainerSnapOnlyName, _ := containerGetParentAndSnapshotName(ctName) - snapName := fmt.Sprintf("snapshot-%s", sourceContainerSnapOnlyName) - - if zfsFilesystemEntityExists(onDiskPoolName, - fmt.Sprintf("containers/%s@%s", - projectPrefix(project, sourceContainerName), snapName)) { - removable, err := zfsPoolVolumeSnapshotRemovable(onDiskPoolName, - fmt.Sprintf("containers/%s", - projectPrefix(project, sourceContainerName)), - snapName) - if err != nil { - return err - } - - if removable { - err = zfsPoolVolumeSnapshotDestroy(onDiskPoolName, - fmt.Sprintf("containers/%s", - projectPrefix(project, sourceContainerName)), - snapName) - } else { - err = zfsPoolVolumeSnapshotRename(onDiskPoolName, - fmt.Sprintf("containers/%s", - projectPrefix(project, sourceContainerName)), - snapName, - fmt.Sprintf("copy-%s", uuid.NewRandom().String())) - } - if err != nil { - return err - } - } - - // Delete the snapshot on its storage pool: - // ${POOL}/snapshots/ - snapshotContainerMntPoint := getSnapshotMountPoint(project, poolName, ctName) - if shared.PathExists(snapshotContainerMntPoint) { - err := os.RemoveAll(snapshotContainerMntPoint) - if err != nil { - return err - } - } - - // Check if we can remove the snapshot symlink: - // ${LXD_DIR}/snapshots/ to ${POOL}/snapshots/ - // by checking if the directory is empty. - snapshotContainerPath := getSnapshotMountPoint(project, poolName, sourceContainerName) - empty, _ := shared.PathIsEmpty(snapshotContainerPath) - if empty == true { - // Remove the snapshot directory for the container: - // ${POOL}/snapshots/ - err := os.Remove(snapshotContainerPath) - if err != nil { - return err - } - - snapshotSymlink := shared.VarPath("snapshots", projectPrefix(project, sourceContainerName)) - if shared.PathExists(snapshotSymlink) { - err := os.Remove(snapshotSymlink) - if err != nil { - return err - } - } - } - - // Legacy - snapPath := shared.VarPath(fmt.Sprintf("snapshots/%s/%s.zfs", projectPrefix(project, sourceContainerName), sourceContainerSnapOnlyName)) - if shared.PathExists(snapPath) { - err := os.Remove(snapPath) - if err != nil { - return err - } - } - - // Legacy - parent := shared.VarPath(fmt.Sprintf("snapshots/%s", projectPrefix(project, sourceContainerName))) - if ok, _ := shared.PathIsEmpty(parent); ok { - err := os.Remove(parent) - if err != nil { - return err - } - } - - return nil -} - -func (s *storageZfs) ContainerSnapshotDelete(snapshotContainer container) error { - logger.Debugf("Deleting ZFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - poolName := s.getOnDiskPoolName() - err := zfsSnapshotDeleteInternal(snapshotContainer.Project(), s.pool.Name, snapshotContainer.Name(), - poolName) - if err != nil { - return err - } - - logger.Debugf("Deleted ZFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageZfs) ContainerSnapshotRename(snapshotContainer container, newName string) error { - logger.Debugf("Renaming ZFS storage volume for snapshot \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - - oldName := snapshotContainer.Name() - - oldcName, oldSnapOnlyName, _ := containerGetParentAndSnapshotName(snapshotContainer.Name()) - oldZfsDatasetName := fmt.Sprintf("snapshot-%s", oldSnapOnlyName) - - _, newSnapOnlyName, _ := containerGetParentAndSnapshotName(newName) - newZfsDatasetName := fmt.Sprintf("snapshot-%s", newSnapOnlyName) - - if oldZfsDatasetName != newZfsDatasetName { - err := zfsPoolVolumeSnapshotRename( - s.getOnDiskPoolName(), fmt.Sprintf("containers/%s", projectPrefix(snapshotContainer.Project(), oldcName)), oldZfsDatasetName, newZfsDatasetName) - if err != nil { - return err - } - } - revert := true - defer func() { - if !revert { - return - } - //s.ContainerSnapshotRename(snapshotContainer, oldName) - }() - - oldStyleSnapshotMntPoint := shared.VarPath(fmt.Sprintf("snapshots/%s/%s.zfs", projectPrefix(snapshotContainer.Project(), oldcName), oldSnapOnlyName)) - if shared.PathExists(oldStyleSnapshotMntPoint) { - err := os.Remove(oldStyleSnapshotMntPoint) - if err != nil { - return err - } - } - - oldSnapshotMntPoint := getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, oldName) - if shared.PathExists(oldSnapshotMntPoint) { - err := os.Remove(oldSnapshotMntPoint) - if err != nil { - return err - } - } - - newSnapshotMntPoint := getSnapshotMountPoint(snapshotContainer.Project(), s.pool.Name, newName) - if !shared.PathExists(newSnapshotMntPoint) { - err := os.MkdirAll(newSnapshotMntPoint, 0700) - if err != nil { - return err - } - } - - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(snapshotContainer.Project(), oldcName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(snapshotContainer.Project(), oldcName)) - if !shared.PathExists(snapshotMntPointSymlink) { - err := os.Symlink(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - } - - revert = false - - logger.Debugf("Renamed ZFS storage volume for snapshot \"%s\" from %s to %s", s.volume.Name, s.volume.Name, newName) - return nil -} - -func (s *storageZfs) ContainerSnapshotStart(container container) (bool, error) { - logger.Debugf("Initializing ZFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - cName, sName, _ := containerGetParentAndSnapshotName(container.Name()) - sourceFs := fmt.Sprintf("containers/%s", projectPrefix(container.Project(), cName)) - sourceSnap := fmt.Sprintf("snapshot-%s", sName) - destFs := fmt.Sprintf("snapshots/%s/%s", projectPrefix(container.Project(), cName), sName) - - poolName := s.getOnDiskPoolName() - snapshotMntPoint := getSnapshotMountPoint(container.Project(), s.pool.Name, container.Name()) - err := zfsPoolVolumeClone(container.Project(), poolName, sourceFs, sourceSnap, destFs, snapshotMntPoint) - if err != nil { - return false, err - } - - err = zfsMount(poolName, destFs) - if err != nil { - return false, err - } - - logger.Debugf("Initialized ZFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return true, nil -} - -func (s *storageZfs) ContainerSnapshotStop(container container) (bool, error) { - logger.Debugf("Stopping ZFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - cName, sName, _ := containerGetParentAndSnapshotName(container.Name()) - destFs := fmt.Sprintf("snapshots/%s/%s", projectPrefix(container.Project(), cName), sName) - - err := zfsPoolVolumeDestroy(s.getOnDiskPoolName(), destFs) - if err != nil { - return false, err - } - - logger.Debugf("Stopped ZFS storage volume for snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return true, nil -} - -func (s *storageZfs) ContainerSnapshotCreateEmpty(snapshotContainer container) error { - /* don't touch the fs yet, as migration will do that for us */ - return nil -} - -func (s *storageZfs) doContainerOnlyBackup(tmpPath string, backup backup, source container) error { - sourceIsSnapshot := source.IsSnapshot() - poolName := s.getOnDiskPoolName() - - sourceName := source.Name() - sourceDataset := "" - snapshotSuffix := "" - - if sourceIsSnapshot { - sourceParentName, sourceSnapOnlyName, _ := containerGetParentAndSnapshotName(source.Name()) - snapshotSuffix = fmt.Sprintf("backup-%s", sourceSnapOnlyName) - sourceDataset = fmt.Sprintf("%s/containers/%s@%s", poolName, sourceParentName, snapshotSuffix) - } else { - snapshotSuffix = uuid.NewRandom().String() - sourceDataset = fmt.Sprintf("%s/containers/%s@%s", poolName, sourceName, snapshotSuffix) - - fs := fmt.Sprintf("containers/%s", projectPrefix(source.Project(), sourceName)) - err := zfsPoolVolumeSnapshotCreate(poolName, fs, snapshotSuffix) - if err != nil { - return err - } - - defer func() { - err := zfsPoolVolumeSnapshotDestroy(poolName, fs, snapshotSuffix) - if err != nil { - logger.Warnf("Failed to delete temporary ZFS snapshot \"%s\", manual cleanup needed", sourceDataset) - } - }() - } - - // Dump the container to a file - backupFile := fmt.Sprintf("%s/%s", tmpPath, "container.bin") - f, err := os.OpenFile(backupFile, os.O_RDWR|os.O_CREATE, 0644) - if err != nil { - return err - } - defer f.Close() - - zfsSendCmd := exec.Command("zfs", "send", sourceDataset) - zfsSendCmd.Stdout = f - err = zfsSendCmd.Run() - if err != nil { - return err - } - - return nil -} - -func (s *storageZfs) doSnapshotBackup(tmpPath string, backup backup, source container, parentSnapshot string) error { - sourceName := source.Name() - snapshotsPath := fmt.Sprintf("%s/snapshots", tmpPath) - - // Create backup path for snapshots - err := os.MkdirAll(snapshotsPath, 0711) - if err != nil { - return err - } - - poolName := s.getOnDiskPoolName() - sourceParentName, sourceSnapOnlyName, _ := containerGetParentAndSnapshotName(sourceName) - currentSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, sourceParentName, sourceSnapOnlyName) - args := []string{"send", currentSnapshotDataset} - if parentSnapshot != "" { - parentName, parentSnaponlyName, _ := containerGetParentAndSnapshotName(parentSnapshot) - parentSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, parentName, parentSnaponlyName) - args = append(args, "-i", parentSnapshotDataset) - } - - backupFile := fmt.Sprintf("%s/%s.bin", snapshotsPath, sourceSnapOnlyName) - f, err := os.OpenFile(backupFile, os.O_RDWR|os.O_CREATE, 0644) - if err != nil { - return err - } - defer f.Close() - - zfsSendCmd := exec.Command("zfs", args...) - zfsSendCmd.Stdout = f - return zfsSendCmd.Run() -} - -func (s *storageZfs) doContainerBackupCreateOptimized(tmpPath string, backup backup, source container) error { - // Handle snapshots - snapshots, err := source.Snapshots() - if err != nil { - return err - } - - if backup.containerOnly || len(snapshots) == 0 { - err = s.doContainerOnlyBackup(tmpPath, backup, source) - } else { - prev := "" - prevSnapOnlyName := "" - for i, snap := range snapshots { - if i > 0 { - prev = snapshots[i-1].Name() - } - - sourceSnapshot, err := containerLoadByProjectAndName(s.s, source.Project(), snap.Name()) - if err != nil { - return err - } - - _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap.Name()) - prevSnapOnlyName = snapOnlyName - err = s.doSnapshotBackup(tmpPath, backup, sourceSnapshot, prev) - if err != nil { - return err - } - } - - // Dump the container to a file - poolName := s.getOnDiskPoolName() - tmpSnapshotName := fmt.Sprintf("backup-%s", uuid.NewRandom().String()) - err = zfsPoolVolumeSnapshotCreate(poolName, fmt.Sprintf("containers/%s", projectPrefix(source.Project(), source.Name())), tmpSnapshotName) - if err != nil { - return err - } - - currentSnapshotDataset := fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(source.Project(), source.Name()), tmpSnapshotName) - args := []string{"send", currentSnapshotDataset} - if prevSnapOnlyName != "" { - parentSnapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, projectPrefix(source.Project(), source.Name()), prevSnapOnlyName) - args = append(args, "-i", parentSnapshotDataset) - } - - backupFile := fmt.Sprintf("%s/container.bin", tmpPath) - f, err := os.OpenFile(backupFile, os.O_RDWR|os.O_CREATE, 0644) - if err != nil { - return err - } - defer f.Close() - - zfsSendCmd := exec.Command("zfs", args...) - zfsSendCmd.Stdout = f - - err = zfsSendCmd.Run() - if err != nil { - return err - } - - zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", source.Name()), tmpSnapshotName) - } - if err != nil { - return err - } - - return nil -} - -func (s *storageZfs) doContainerBackupCreateVanilla(tmpPath string, backup backup, source container) error { - // Prepare for rsync - rsync := func(oldPath string, newPath string, bwlimit string) error { - output, err := rsyncLocalCopy(oldPath, newPath, bwlimit) - if err != nil { - return fmt.Errorf("Failed to rsync: %s: %s", string(output), err) - } - - return nil - } - - bwlimit := s.pool.Config["rsync.bwlimit"] - project := backup.container.Project() - - // Handle snapshots - if !backup.containerOnly { - snapshotsPath := fmt.Sprintf("%s/snapshots", tmpPath) - - // Retrieve the snapshots - snapshots, err := source.Snapshots() - if err != nil { - return errors.Wrap(err, "Retrieve snaphots") - } - - // Create the snapshot path - if len(snapshots) > 0 { - err = os.MkdirAll(snapshotsPath, 0711) - if err != nil { - return errors.Wrap(err, "Create snapshot path") - } - } - - for _, snap := range snapshots { - _, snapName, _ := containerGetParentAndSnapshotName(snap.Name()) - - // Mount the snapshot to a usable path - _, err := s.ContainerSnapshotStart(snap) - if err != nil { - return errors.Wrap(err, "Mount snapshot") - } - - snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, snap.Name()) - target := fmt.Sprintf("%s/%s", snapshotsPath, snapName) - - // Copy the snapshot - err = rsync(snapshotMntPoint, target, bwlimit) - s.ContainerSnapshotStop(snap) - if err != nil { - return errors.Wrap(err, "Copy snapshot") - } - } - } - - // Make a temporary copy of the container - containersPath := getContainerMountPoint("default", s.pool.Name, "") - tmpContainerMntPoint, err := ioutil.TempDir(containersPath, source.Name()) - if err != nil { - return errors.Wrap(err, "Create temporary copy dir") - } - defer os.RemoveAll(tmpContainerMntPoint) - - err = os.Chmod(tmpContainerMntPoint, 0700) - if err != nil { - return errors.Wrap(err, "Change temporary mount point permissions") - } - - snapshotSuffix := uuid.NewRandom().String() - sourceName := source.Name() - fs := fmt.Sprintf("containers/%s", projectPrefix(project, sourceName)) - sourceZfsDatasetSnapshot := fmt.Sprintf("snapshot-%s", snapshotSuffix) - poolName := s.getOnDiskPoolName() - err = zfsPoolVolumeSnapshotCreate(poolName, fs, sourceZfsDatasetSnapshot) - if err != nil { - return err - } - defer zfsPoolVolumeSnapshotDestroy(poolName, fs, sourceZfsDatasetSnapshot) - - targetZfsDataset := fmt.Sprintf("containers/%s", snapshotSuffix) - err = zfsPoolVolumeClone(source.Project(), poolName, fs, sourceZfsDatasetSnapshot, targetZfsDataset, tmpContainerMntPoint) - if err != nil { - return errors.Wrap(err, "Clone volume") - } - defer zfsPoolVolumeDestroy(poolName, targetZfsDataset) - - // Mount the temporary copy - if !shared.IsMountPoint(tmpContainerMntPoint) { - err = zfsMount(poolName, targetZfsDataset) - if err != nil { - return errors.Wrap(err, "Mount temporary copy") - } - defer zfsUmount(poolName, targetZfsDataset, tmpContainerMntPoint) - } - - // Copy the container - containerPath := fmt.Sprintf("%s/container", tmpPath) - err = rsync(tmpContainerMntPoint, containerPath, bwlimit) - if err != nil { - return errors.Wrap(err, "Copy container") - } - - return nil -} - -func (s *storageZfs) ContainerBackupCreate(backup backup, source container) error { - // Start storage - ourStart, err := source.StorageStart() - if err != nil { - return err - } - if ourStart { - defer source.StorageStop() - } - - // Create a temporary path for the backup - tmpPath, err := ioutil.TempDir(shared.VarPath("backups"), "lxd_backup_") - if err != nil { - return err - } - defer os.RemoveAll(tmpPath) - - // Generate the actual backup - if backup.optimizedStorage { - err = s.doContainerBackupCreateOptimized(tmpPath, backup, source) - if err != nil { - return errors.Wrap(err, "Optimized backup") - } - } else { - err = s.doContainerBackupCreateVanilla(tmpPath, backup, source) - if err != nil { - return errors.Wrap(err, "Vanilla backup") - } - } - - // Pack the backup - err = backupCreateTarball(s.s, tmpPath, backup) - if err != nil { - return err - } - - return nil -} - -func (s *storageZfs) doContainerBackupLoadOptimized(info backupInfo, data io.ReadSeeker, tarArgs []string) error { - containerName, _, _ := containerGetParentAndSnapshotName(info.Name) - containerMntPoint := getContainerMountPoint(info.Project, s.pool.Name, containerName) - err := createContainerMountpoint(containerMntPoint, containerPath(info.Project, info.Name, false), info.Privileged) - if err != nil { - return err - } - - unpackPath := fmt.Sprintf("%s/.backup", containerMntPoint) - err = os.MkdirAll(unpackPath, 0711) - if err != nil { - return err - } - - err = os.Chmod(unpackPath, 0700) - if err != nil { - // can't use defer because it needs to run before the mount - os.RemoveAll(unpackPath) - return err - } - - // Prepare tar arguments - args := append(tarArgs, []string{ - "-", - "--strip-components=1", - "-C", unpackPath, "backup", - }...) - - // Extract container - data.Seek(0, 0) - err = shared.RunCommandWithFds(data, nil, "tar", args...) - if err != nil { - // can't use defer because it needs to run before the mount - os.RemoveAll(unpackPath) - logger.Errorf("Failed to untar \"%s\" into \"%s\": %s", info.Name, unpackPath, err) - return err - } - - poolName := s.getOnDiskPoolName() - for _, snapshotOnlyName := range info.Snapshots { - snapshotBackup := fmt.Sprintf("%s/snapshots/%s.bin", unpackPath, snapshotOnlyName) - feeder, err := os.Open(snapshotBackup) - if err != nil { - // can't use defer because it needs to run before the mount - os.RemoveAll(unpackPath) - return err - } - - snapshotDataset := fmt.Sprintf("%s/containers/%s at snapshot-%s", poolName, containerName, snapshotOnlyName) - zfsRecvCmd := exec.Command("zfs", "receive", "-F", snapshotDataset) - zfsRecvCmd.Stdin = feeder - err = zfsRecvCmd.Run() - feeder.Close() - if err != nil { - // can't use defer because it needs to run before the mount - os.RemoveAll(unpackPath) - return err - } - - // create mountpoint - snapshotMntPoint := getSnapshotMountPoint(info.Project, s.pool.Name, fmt.Sprintf("%s/%s", containerName, snapshotOnlyName)) - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(info.Project, containerName)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(info.Project, containerName)) - err = createSnapshotMountpoint(snapshotMntPoint, snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - // can't use defer because it needs to run before the mount - os.RemoveAll(unpackPath) - return err - } - } - - containerBackup := fmt.Sprintf("%s/container.bin", unpackPath) - feeder, err := os.Open(containerBackup) - if err != nil { - // can't use defer because it needs to run before the mount - os.RemoveAll(unpackPath) - return err - } - defer feeder.Close() - - containerSnapshotDataset := fmt.Sprintf("%s/containers/%s at backup", poolName, containerName) - zfsRecvCmd := exec.Command("zfs", "receive", "-F", containerSnapshotDataset) - zfsRecvCmd.Stdin = feeder - - err = zfsRecvCmd.Run() - os.RemoveAll(unpackPath) - zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", containerName), "backup") - if err != nil { - return err - } - - fs := fmt.Sprintf("containers/%s", containerName) - err = zfsPoolVolumeSet(poolName, fs, "canmount", "noauto") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(poolName, fs, "mountpoint", containerMntPoint) - if err != nil { - return err - } - - _, err = s.doContainerMount("default", containerName, info.Privileged) - if err != nil { - return err - } - - return nil -} - -func (s *storageZfs) doContainerBackupLoadVanilla(info backupInfo, data io.ReadSeeker, tarArgs []string) error { - // create the main container - err := s.doContainerCreate(info.Project, info.Name, info.Privileged) - if err != nil { - s.doContainerDelete(info.Project, info.Name) - return errors.Wrap(err, "Create container") - } - - _, err = s.doContainerMount(info.Project, info.Name, info.Privileged) - if err != nil { - return errors.Wrap(err, "Mount container") - } - - containerMntPoint := getContainerMountPoint(info.Project, s.pool.Name, info.Name) - // Extract container - for _, snap := range info.Snapshots { - // Extract snapshots - cur := fmt.Sprintf("backup/snapshots/%s", snap) - - // Prepare tar arguments - args := append(tarArgs, []string{ - "-", - "--recursive-unlink", - "--strip-components=3", - "--xattrs-include=*", - "-C", containerMntPoint, cur, - }...) - - // Unpack - data.Seek(0, 0) - err = shared.RunCommandWithFds(data, nil, "tar", args...) - if err != nil { - logger.Errorf("Failed to untar \"%s\" into \"%s\": %s", cur, containerMntPoint, err) - return errors.Wrap(err, "Unpack") - } - - // create snapshot - err = s.doContainerSnapshotCreate(info.Project, fmt.Sprintf("%s/%s", info.Name, snap), info.Name) - if err != nil { - return errors.Wrap(err, "Create snapshot") - } - } - - // Prepare tar arguments - args := append(tarArgs, []string{ - "-", - "--strip-components=2", - "--xattrs-include=*", - "-C", containerMntPoint, "backup/container", - }...) - - // Extract container - data.Seek(0, 0) - err = shared.RunCommandWithFds(data, nil, "tar", args...) - if err != nil { - logger.Errorf("Failed to untar \"backup/container\" into \"%s\": %s", containerMntPoint, err) - return errors.Wrap(err, "Extract") - } - - return nil -} - -func (s *storageZfs) ContainerBackupLoad(info backupInfo, data io.ReadSeeker, tarArgs []string) error { - logger.Debugf("Loading ZFS storage volume for backup \"%s\" on storage pool \"%s\"", info.Name, s.pool.Name) - - if info.HasBinaryFormat { - return s.doContainerBackupLoadOptimized(info, data, tarArgs) - } - - return s.doContainerBackupLoadVanilla(info, data, tarArgs) -} - -// - create temporary directory ${LXD_DIR}/images/lxd_images_ -// - create new zfs volume images/ -// - mount the zfs volume on ${LXD_DIR}/images/lxd_images_ -// - unpack the downloaded image in ${LXD_DIR}/images/lxd_images_ -// - mark new zfs volume images/ readonly -// - remove mountpoint property from zfs volume images/ -// - create read-write snapshot from zfs volume images/ -func (s *storageZfs) ImageCreate(fingerprint string, tracker *ioprogress.ProgressTracker) error { - logger.Debugf("Creating ZFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - - poolName := s.getOnDiskPoolName() - imageMntPoint := getImageMountPoint(s.pool.Name, fingerprint) - fs := fmt.Sprintf("images/%s", fingerprint) - revert := true - subrevert := true - - err := s.createImageDbPoolVolume(fingerprint) - if err != nil { - return err - } - defer func() { - if !subrevert { - return - } - s.deleteImageDbPoolVolume(fingerprint) - }() - - if zfsFilesystemEntityExists(poolName, fmt.Sprintf("deleted/%s", fs)) { - if err := zfsPoolVolumeRename(poolName, fmt.Sprintf("deleted/%s", fs), fs, true); err != nil { - return err - } - - defer func() { - if !revert { - return - } - s.ImageDelete(fingerprint) - }() - - // In case this is an image from an older lxd instance, wipe the - // mountpoint. - err = zfsPoolVolumeSet(poolName, fs, "mountpoint", "none") - if err != nil { - return err - } - - revert = false - subrevert = false - - return nil - } - - if !shared.PathExists(imageMntPoint) { - err := os.MkdirAll(imageMntPoint, 0700) - if err != nil { - return err - } - defer func() { - if !subrevert { - return - } - os.RemoveAll(imageMntPoint) - }() - } - - // Create temporary mountpoint directory. - tmp := getImageMountPoint(s.pool.Name, "") - tmpImageDir, err := ioutil.TempDir(tmp, "") - if err != nil { - return err - } - defer os.RemoveAll(tmpImageDir) - - imagePath := shared.VarPath("images", fingerprint) - - // Create a new storage volume on the storage pool for the image. - dataset := fmt.Sprintf("%s/%s", poolName, fs) - msg, err := zfsPoolVolumeCreate(dataset, "mountpoint=none") - if err != nil { - logger.Errorf("Failed to create ZFS dataset \"%s\" on storage pool \"%s\": %s", dataset, s.pool.Name, msg) - return err - } - subrevert = false - defer func() { - if !revert { - return - } - s.ImageDelete(fingerprint) - }() - - // Set a temporary mountpoint for the image. - err = zfsPoolVolumeSet(poolName, fs, "mountpoint", tmpImageDir) - if err != nil { - return err - } - - // Make sure that the image actually got mounted. - if !shared.IsMountPoint(tmpImageDir) { - zfsMount(poolName, fs) - } - - // Unpack the image into the temporary mountpoint. - err = unpackImage(imagePath, tmpImageDir, storageTypeZfs, s.s.OS.RunningInUserNS, nil) - if err != nil { - return err - } - - // Mark the new storage volume for the image as readonly. - if err = zfsPoolVolumeSet(poolName, fs, "readonly", "on"); err != nil { - return err - } - - // Remove the temporary mountpoint from the image storage volume. - if err = zfsPoolVolumeSet(poolName, fs, "mountpoint", "none"); err != nil { - return err - } - - // Make sure that the image actually got unmounted. - if shared.IsMountPoint(tmpImageDir) { - zfsUmount(poolName, fs, tmpImageDir) - } - - // Create a snapshot of that image on the storage pool which we clone for - // container creation. - err = zfsPoolVolumeSnapshotCreate(poolName, fs, "readonly") - if err != nil { - return err - } - - revert = false - - logger.Debugf("Created ZFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - return nil -} - -func (s *storageZfs) ImageDelete(fingerprint string) error { - logger.Debugf("Deleting ZFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - - poolName := s.getOnDiskPoolName() - fs := fmt.Sprintf("images/%s", fingerprint) - - if zfsFilesystemEntityExists(poolName, fs) { - removable, err := zfsPoolVolumeSnapshotRemovable(poolName, fs, "readonly") - if err != nil && zfsFilesystemEntityExists(poolName, fmt.Sprintf("%s at readonly", fs)) { - return err - } - - if removable { - err := zfsPoolVolumeDestroy(poolName, fs) - if err != nil { - return err - } - } else { - if err := zfsPoolVolumeSet(poolName, fs, "mountpoint", "none"); err != nil { - return err - } - - if err := zfsPoolVolumeRename(poolName, fs, fmt.Sprintf("deleted/%s", fs), true); err != nil { - return err - } - } - } - - err := s.deleteImageDbPoolVolume(fingerprint) - if err != nil { - return err - } - - imageMntPoint := getImageMountPoint(s.pool.Name, fingerprint) - if shared.PathExists(imageMntPoint) { - err := os.RemoveAll(imageMntPoint) - if err != nil { - return err - } - } - - if shared.PathExists(shared.VarPath(fs + ".zfs")) { - err := os.RemoveAll(shared.VarPath(fs + ".zfs")) - if err != nil { - return err - } - } - - logger.Debugf("Deleted ZFS storage volume for image \"%s\" on storage pool \"%s\"", fingerprint, s.pool.Name) - return nil -} - -func (s *storageZfs) ImageMount(fingerprint string) (bool, error) { - return true, nil -} - -func (s *storageZfs) ImageUmount(fingerprint string) (bool, error) { - return true, nil -} - -type zfsMigrationSourceDriver struct { - container container - snapshots []container - zfsSnapshotNames []string - zfs *storageZfs - runningSnapName string - stoppedSnapName string - zfsFeatures []string -} - -func (s *zfsMigrationSourceDriver) send(conn *websocket.Conn, zfsName string, zfsParent string, readWrapper func(io.ReadCloser) io.ReadCloser) error { - sourceParentName, _, _ := containerGetParentAndSnapshotName(s.container.Name()) - poolName := s.zfs.getOnDiskPoolName() - args := []string{"send"} - - // Negotiated options - if s.zfsFeatures != nil && len(s.zfsFeatures) > 0 { - if shared.StringInSlice("compress", s.zfsFeatures) { - args = append(args, "-c") - args = append(args, "-L") - } - } - - args = append(args, []string{fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(s.container.Project(), sourceParentName), zfsName)}...) - if zfsParent != "" { - args = append(args, "-i", fmt.Sprintf("%s/containers/%s@%s", poolName, projectPrefix(s.container.Project(), s.container.Name()), zfsParent)) - } - - cmd := exec.Command("zfs", args...) - - stdout, err := cmd.StdoutPipe() - if err != nil { - return err - } - - readPipe := io.ReadCloser(stdout) - if readWrapper != nil { - readPipe = readWrapper(stdout) - } - - stderr, err := cmd.StderrPipe() - if err != nil { - return err - } - - if err := cmd.Start(); err != nil { - return err - } - - <-shared.WebsocketSendStream(conn, readPipe, 4*1024*1024) - - output, err := ioutil.ReadAll(stderr) - if err != nil { - logger.Errorf("Problem reading zfs send stderr: %s", err) - } - - err = cmd.Wait() - if err != nil { - logger.Errorf("Problem with zfs send: %s", string(output)) - } - - return err -} - -func (s *zfsMigrationSourceDriver) SendWhileRunning(conn *websocket.Conn, op *operation, bwlimit string, containerOnly bool) error { - if s.container.IsSnapshot() { - _, snapOnlyName, _ := containerGetParentAndSnapshotName(s.container.Name()) - snapshotName := fmt.Sprintf("snapshot-%s", snapOnlyName) - wrapper := StorageProgressReader(op, "fs_progress", s.container.Name()) - return s.send(conn, snapshotName, "", wrapper) - } - - lastSnap := "" - if !containerOnly { - for i, snap := range s.zfsSnapshotNames { - prev := "" - if i > 0 { - prev = s.zfsSnapshotNames[i-1] - } - - lastSnap = snap - - wrapper := StorageProgressReader(op, "fs_progress", snap) - if err := s.send(conn, snap, prev, wrapper); err != nil { - return err - } - } - } - - s.runningSnapName = fmt.Sprintf("migration-send-%s", uuid.NewRandom().String()) - if err := zfsPoolVolumeSnapshotCreate(s.zfs.getOnDiskPoolName(), fmt.Sprintf("containers/%s", projectPrefix(s.container.Project(), s.container.Name())), s.runningSnapName); err != nil { - return err - } - - wrapper := StorageProgressReader(op, "fs_progress", s.container.Name()) - if err := s.send(conn, s.runningSnapName, lastSnap, wrapper); err != nil { - return err - } - - return nil -} - -func (s *zfsMigrationSourceDriver) SendAfterCheckpoint(conn *websocket.Conn, bwlimit string) error { - s.stoppedSnapName = fmt.Sprintf("migration-send-%s", uuid.NewRandom().String()) - if err := zfsPoolVolumeSnapshotCreate(s.zfs.getOnDiskPoolName(), fmt.Sprintf("containers/%s", projectPrefix(s.container.Project(), s.container.Name())), s.stoppedSnapName); err != nil { - return err - } - - if err := s.send(conn, s.stoppedSnapName, s.runningSnapName, nil); err != nil { - return err - } - - return nil -} - -func (s *zfsMigrationSourceDriver) Cleanup() { - poolName := s.zfs.getOnDiskPoolName() - if s.stoppedSnapName != "" { - zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(s.container.Project(), s.container.Name())), s.stoppedSnapName) - } - if s.runningSnapName != "" { - zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(s.container.Project(), s.container.Name())), s.runningSnapName) - } -} - -func (s *storageZfs) MigrationType() migration.MigrationFSType { - return migration.MigrationFSType_ZFS -} - -func (s *storageZfs) PreservesInodes() bool { - return true -} - -func (s *storageZfs) MigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { - /* If the container is a snapshot, let's just send that; we don't need - * to send anything else, because that's all the user asked for. - */ - if args.Container.IsSnapshot() { - return &zfsMigrationSourceDriver{container: args.Container, zfs: s, zfsFeatures: args.ZfsFeatures}, nil - } - - driver := zfsMigrationSourceDriver{ - container: args.Container, - snapshots: []container{}, - zfsSnapshotNames: []string{}, - zfs: s, - zfsFeatures: args.ZfsFeatures, - } - - if args.ContainerOnly { - return &driver, nil - } - - /* List all the snapshots in order of reverse creation. The idea here - * is that we send the oldest to newest snapshot, hopefully saving on - * xfer costs. Then, after all that, we send the container itself. - */ - snapshots, err := zfsPoolListSnapshots(s.getOnDiskPoolName(), fmt.Sprintf("containers/%s", projectPrefix(args.Container.Project(), args.Container.Name()))) - if err != nil { - return nil, err - } - - for _, snap := range snapshots { - /* In the case of e.g. multiple copies running at the same - * time, we will have potentially multiple migration-send - * snapshots. (Or in the case of the test suite, sometimes one - * will take too long to delete.) - */ - if !strings.HasPrefix(snap, "snapshot-") { - continue - } - - lxdName := fmt.Sprintf("%s%s%s", args.Container.Name(), shared.SnapshotDelimiter, snap[len("snapshot-"):]) - snapshot, err := containerLoadByProjectAndName(s.s, args.Container.Project(), lxdName) - if err != nil { - return nil, err - } - - driver.snapshots = append(driver.snapshots, snapshot) - driver.zfsSnapshotNames = append(driver.zfsSnapshotNames, snap) - } - - return &driver, nil -} - -func (s *storageZfs) MigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { - poolName := s.getOnDiskPoolName() - zfsRecv := func(zfsName string, writeWrapper func(io.WriteCloser) io.WriteCloser) error { - zfsFsName := fmt.Sprintf("%s/%s", poolName, zfsName) - args := []string{"receive", "-F", "-u", zfsFsName} - cmd := exec.Command("zfs", args...) - - stdin, err := cmd.StdinPipe() - if err != nil { - return err - } - - stderr, err := cmd.StderrPipe() - if err != nil { - return err - } - - if err := cmd.Start(); err != nil { - return err - } - - writePipe := io.WriteCloser(stdin) - if writeWrapper != nil { - writePipe = writeWrapper(stdin) - } - - <-shared.WebsocketRecvStream(writePipe, conn) - - output, err := ioutil.ReadAll(stderr) - if err != nil { - logger.Debugf("Problem reading zfs recv stderr %s", err) - } - - err = cmd.Wait() - if err != nil { - logger.Errorf("Problem with zfs recv: %s", string(output)) - } - return err - } - - /* In some versions of zfs we can write `zfs recv -F` to mounted - * filesystems, and in some versions we can't. So, let's always unmount - * this fs (it's empty anyway) before we zfs recv. N.B. that `zfs recv` - * of a snapshot also needs tha actual fs that it has snapshotted - * unmounted, so we do this before receiving anything. - */ - zfsName := fmt.Sprintf("containers/%s", projectPrefix(args.Container.Project(), args.Container.Name())) - containerMntPoint := getContainerMountPoint(args.Container.Project(), s.pool.Name, args.Container.Name()) - if shared.IsMountPoint(containerMntPoint) { - err := zfsUmount(poolName, zfsName, containerMntPoint) - if err != nil { - return err - } - } - - if len(args.Snapshots) > 0 { - snapshotMntPointSymlinkTarget := shared.VarPath("storage-pools", s.pool.Name, "containers-snapshots", projectPrefix(args.Container.Project(), s.volume.Name)) - snapshotMntPointSymlink := shared.VarPath("snapshots", projectPrefix(args.Container.Project(), args.Container.Name())) - if !shared.PathExists(snapshotMntPointSymlink) { - err := os.Symlink(snapshotMntPointSymlinkTarget, snapshotMntPointSymlink) - if err != nil { - return err - } - } - } - - // At this point we have already figured out the parent - // container's root disk device so we can simply - // retrieve it from the expanded devices. - parentStoragePool := "" - parentExpandedDevices := args.Container.ExpandedDevices() - parentLocalRootDiskDeviceKey, parentLocalRootDiskDevice, _ := shared.GetRootDiskDevice(parentExpandedDevices) - if parentLocalRootDiskDeviceKey != "" { - parentStoragePool = parentLocalRootDiskDevice["pool"] - } - - // A little neuroticism. - if parentStoragePool == "" { - return fmt.Errorf("detected that the container's root device is missing the pool property during BTRFS migration") - } - - for _, snap := range args.Snapshots { - ctArgs := snapshotProtobufToContainerArgs(args.Container.Project(), args.Container.Name(), snap) - - // Ensure that snapshot and parent container have the - // same storage pool in their local root disk device. - // If the root disk device for the snapshot comes from a - // profile on the new instance as well we don't need to - // do anything. - if ctArgs.Devices != nil { - snapLocalRootDiskDeviceKey, _, _ := shared.GetRootDiskDevice(ctArgs.Devices) - if snapLocalRootDiskDeviceKey != "" { - ctArgs.Devices[snapLocalRootDiskDeviceKey]["pool"] = parentStoragePool - } - } - _, err := containerCreateEmptySnapshot(args.Container.DaemonState(), ctArgs) - if err != nil { - return err - } - - wrapper := StorageProgressWriter(op, "fs_progress", snap.GetName()) - name := fmt.Sprintf("containers/%s at snapshot-%s", projectPrefix(args.Container.Project(), args.Container.Name()), snap.GetName()) - if err := zfsRecv(name, wrapper); err != nil { - return err - } - - snapshotMntPoint := getSnapshotMountPoint(args.Container.Project(), poolName, fmt.Sprintf("%s/%s", args.Container.Name(), *snap.Name)) - if !shared.PathExists(snapshotMntPoint) { - err := os.MkdirAll(snapshotMntPoint, 0700) - if err != nil { - return err - } - } - } - - defer func() { - /* clean up our migration-send snapshots that we got from recv. */ - zfsSnapshots, err := zfsPoolListSnapshots(poolName, fmt.Sprintf("containers/%s", projectPrefix(args.Container.Project(), args.Container.Name()))) - if err != nil { - logger.Errorf("Failed listing snapshots post migration: %s", err) - return - } - - for _, snap := range zfsSnapshots { - // If we received a bunch of snapshots, remove the migration-send-* ones, if not, wipe any snapshot we got - if args.Snapshots != nil && len(args.Snapshots) > 0 && !strings.HasPrefix(snap, "migration-send") { - continue - } - - zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("containers/%s", projectPrefix(args.Container.Project(), args.Container.Name())), snap) - } - }() - - /* finally, do the real container */ - wrapper := StorageProgressWriter(op, "fs_progress", args.Container.Name()) - if err := zfsRecv(zfsName, wrapper); err != nil { - return err - } - - if args.Live { - /* and again for the post-running snapshot if this was a live migration */ - wrapper := StorageProgressWriter(op, "fs_progress", args.Container.Name()) - if err := zfsRecv(zfsName, wrapper); err != nil { - return err - } - } - - /* Sometimes, zfs recv mounts this anyway, even if we pass -u - * (https://forums.freebsd.org/threads/zfs-receive-u-shouldnt-mount-received-filesystem-right.36844/) - * but sometimes it doesn't. Let's try to mount, but not complain about - * failure. - */ - zfsMount(poolName, zfsName) - return nil -} - -func (s *storageZfs) StorageEntitySetQuota(volumeType int, size int64, data interface{}) error { - logger.Debugf(`Setting ZFS quota for "%s"`, s.volume.Name) - - if !shared.IntInSlice(volumeType, supportedVolumeTypes) { - return fmt.Errorf("Invalid storage type") - } - - var c container - var fs string - switch volumeType { - case storagePoolVolumeTypeContainer: - c = data.(container) - fs = fmt.Sprintf("containers/%s", projectPrefix(c.Project(), c.Name())) - case storagePoolVolumeTypeCustom: - fs = fmt.Sprintf("custom/%s", s.volume.Name) - } - - property := "quota" - - if s.pool.Config["volume.zfs.use_refquota"] != "" { - zfsUseRefquota = s.pool.Config["volume.zfs.use_refquota"] - } - if s.volume.Config["zfs.use_refquota"] != "" { - zfsUseRefquota = s.volume.Config["zfs.use_refquota"] - } - - if shared.IsTrue(zfsUseRefquota) { - property = "refquota" - } - - poolName := s.getOnDiskPoolName() - var err error - if size > 0 { - err = zfsPoolVolumeSet(poolName, fs, property, fmt.Sprintf("%d", size)) - } else { - err = zfsPoolVolumeSet(poolName, fs, property, "none") - } - - if err != nil { - return err - } - - logger.Debugf(`Set ZFS quota for "%s"`, s.volume.Name) - return nil -} - -func (s *storageZfs) StoragePoolResources() (*api.ResourcesStoragePool, error) { - poolName := s.getOnDiskPoolName() - - totalBuf, err := zfsFilesystemEntityPropertyGet(poolName, "", "available") - if err != nil { - return nil, err - } - - totalStr := string(totalBuf) - totalStr = strings.TrimSpace(totalStr) - total, err := strconv.ParseUint(totalStr, 10, 64) - if err != nil { - return nil, err - } - - usedBuf, err := zfsFilesystemEntityPropertyGet(poolName, "", "used") - if err != nil { - return nil, err - } - - usedStr := string(usedBuf) - usedStr = strings.TrimSpace(usedStr) - used, err := strconv.ParseUint(usedStr, 10, 64) - if err != nil { - return nil, err - } - - res := api.ResourcesStoragePool{} - res.Space.Total = total - res.Space.Used = used - - // Inode allocation is dynamic so no use in reporting them. - - return &res, nil -} - -func (s *storageZfs) doCrossPoolStorageVolumeCopy(source *api.StorageVolumeSource) error { - successMsg := fmt.Sprintf("Copied ZFS storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) - // setup storage for the source volume - srcStorage, err := storagePoolVolumeInit(s.s, "default", source.Pool, source.Name, storagePoolVolumeTypeCustom) - if err != nil { - logger.Errorf("Failed to initialize ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - ourMount, err := srcStorage.StoragePoolMount() - if err != nil { - return err - } - if ourMount { - defer srcStorage.StoragePoolUmount() - } - - // Create the main volume - err = s.StoragePoolVolumeCreate() - if err != nil { - logger.Errorf("Failed to create ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - ourMount, err = s.StoragePoolVolumeMount() - if err != nil { - logger.Errorf("Failed to mount ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - if ourMount { - defer s.StoragePoolVolumeUmount() - } - - dstMountPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - bwlimit := s.pool.Config["rsync.bwlimit"] - - snapshots, err := storagePoolVolumeSnapshotsGet(s.s, source.Pool, source.Name, storagePoolVolumeTypeCustom) - if err != nil { - return err - } - - if !source.VolumeOnly { - for _, snap := range snapshots { - srcMountPoint := getStoragePoolVolumeSnapshotMountPoint(source.Pool, snap) - - _, err = rsyncLocalCopy(srcMountPoint, dstMountPoint, bwlimit) - if err != nil { - logger.Errorf("Failed to rsync into ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - _, snapOnlyName, _ := containerGetParentAndSnapshotName(source.Name) - - s.StoragePoolVolumeSnapshotCreate(&api.StorageVolumeSnapshotsPost{Name: fmt.Sprintf("%s/%s", s.volume.Name, snapOnlyName)}) - } - } - - var srcMountPoint string - - if strings.Contains(source.Name, "/") { - srcMountPoint = getStoragePoolVolumeSnapshotMountPoint(source.Pool, source.Name) - } else { - srcMountPoint = getStoragePoolVolumeMountPoint(source.Pool, source.Name) - } - - _, err = rsyncLocalCopy(srcMountPoint, dstMountPoint, bwlimit) - if err != nil { - os.RemoveAll(dstMountPoint) - logger.Errorf("Failed to rsync into ZFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) - return err - } - - logger.Infof(successMsg) - return nil -} - -func (s *storageZfs) copyVolumeWithoutSnapshotsFull(source *api.StorageVolumeSource) error { - sourceIsSnapshot := shared.IsSnapshot(source.Name) - - var snapshotSuffix string - var sourceDataset string - var targetDataset string - var targetSnapshotDataset string - - poolName := s.getOnDiskPoolName() - - if sourceIsSnapshot { - sourceVolumeName, sourceSnapOnlyName, _ := containerGetParentAndSnapshotName(source.Name) - snapshotSuffix = fmt.Sprintf("snapshot-%s", sourceSnapOnlyName) - sourceDataset = fmt.Sprintf("%s/custom/%s@%s", source.Pool, sourceVolumeName, snapshotSuffix) - targetSnapshotDataset = fmt.Sprintf("%s/custom/%s at snapshot-%s", poolName, s.volume.Name, sourceSnapOnlyName) - } else { - snapshotSuffix = uuid.NewRandom().String() - sourceDataset = fmt.Sprintf("%s/custom/%s@%s", poolName, source.Name, snapshotSuffix) - targetSnapshotDataset = fmt.Sprintf("%s/custom/%s@%s", poolName, s.volume.Name, snapshotSuffix) - - fs := fmt.Sprintf("custom/%s", source.Name) - err := zfsPoolVolumeSnapshotCreate(poolName, fs, snapshotSuffix) - if err != nil { - return err - } - defer func() { - err := zfsPoolVolumeSnapshotDestroy(poolName, fs, snapshotSuffix) - if err != nil { - logger.Warnf("Failed to delete temporary ZFS snapshot \"%s\", manual cleanup needed", sourceDataset) - } - }() - } - - zfsSendCmd := exec.Command("zfs", "send", sourceDataset) - - zfsRecvCmd := exec.Command("zfs", "receive", targetDataset) - - zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() - zfsRecvCmd.Stdout = os.Stdout - zfsRecvCmd.Stderr = os.Stderr - - err := zfsRecvCmd.Start() - if err != nil { - return err - } - - err = zfsSendCmd.Run() - if err != nil { - return err - } - - err = zfsRecvCmd.Wait() - if err != nil { - return err - } - - msg, err := shared.RunCommand("zfs", "rollback", "-r", "-R", targetSnapshotDataset) - if err != nil { - logger.Errorf("Failed to rollback ZFS dataset: %s", msg) - return err - } - - targetContainerMountPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - targetfs := fmt.Sprintf("custom/%s", s.volume.Name) - - err = zfsPoolVolumeSet(poolName, targetfs, "canmount", "noauto") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(poolName, targetfs, "mountpoint", targetContainerMountPoint) - if err != nil { - return err - } - - err = zfsPoolVolumeSnapshotDestroy(poolName, targetfs, snapshotSuffix) - if err != nil { - return err - } - - return nil -} - -func (s *storageZfs) copyVolumeWithoutSnapshotsSparse(source *api.StorageVolumeSource) error { - poolName := s.getOnDiskPoolName() - - sourceVolumeName := source.Name - sourceVolumePath := getStoragePoolVolumeMountPoint(source.Pool, source.Name) - - targetVolumeName := s.volume.Name - targetVolumePath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - - sourceZfsDataset := "" - sourceZfsDatasetSnapshot := "" - sourceName, sourceSnapOnlyName, isSnapshotName := containerGetParentAndSnapshotName(sourceVolumeName) - - targetZfsDataset := fmt.Sprintf("custom/%s", targetVolumeName) - - if isSnapshotName { - sourceZfsDatasetSnapshot = sourceSnapOnlyName - } - - revert := true - if sourceZfsDatasetSnapshot == "" { - if zfsFilesystemEntityExists(poolName, fmt.Sprintf("custom/%s", sourceName)) { - sourceZfsDatasetSnapshot = fmt.Sprintf("copy-%s", uuid.NewRandom().String()) - sourceZfsDataset = fmt.Sprintf("custom/%s", sourceName) - - err := zfsPoolVolumeSnapshotCreate(poolName, sourceZfsDataset, sourceZfsDatasetSnapshot) - if err != nil { - return err - } - - defer func() { - if !revert { - return - } - zfsPoolVolumeSnapshotDestroy(poolName, sourceZfsDataset, sourceZfsDatasetSnapshot) - }() - } - } else { - if zfsFilesystemEntityExists(poolName, fmt.Sprintf("custom/%s at snapshot-%s", sourceName, sourceZfsDatasetSnapshot)) { - sourceZfsDataset = fmt.Sprintf("custom/%s", sourceName) - sourceZfsDatasetSnapshot = fmt.Sprintf("snapshot-%s", sourceZfsDatasetSnapshot) - } - } - - if sourceZfsDataset != "" { - err := zfsPoolVolumeClone("default", poolName, sourceZfsDataset, sourceZfsDatasetSnapshot, targetZfsDataset, targetVolumePath) - if err != nil { - return err - } - - defer func() { - if !revert { - return - } - zfsPoolVolumeDestroy(poolName, targetZfsDataset) - }() - } else { - bwlimit := s.pool.Config["rsync.bwlimit"] - - output, err := rsyncLocalCopy(sourceVolumePath, targetVolumePath, bwlimit) - if err != nil { - return fmt.Errorf("rsync failed: %s", string(output)) - } - } - - revert = false - - return nil -} - -func (s *storageZfs) StoragePoolVolumeCopy(source *api.StorageVolumeSource) error { - logger.Infof("Copying ZFS storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) - successMsg := fmt.Sprintf("Copied ZFS storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) - - if source.Pool != s.pool.Name { - return s.doCrossPoolStorageVolumeCopy(source) - } - - var snapshots []string - - poolName := s.getOnDiskPoolName() - - if !shared.IsSnapshot(source.Name) { - allSnapshots, err := zfsPoolListSnapshots(poolName, fmt.Sprintf("custom/%s", source.Name)) - if err != nil { - return err - } - - for _, snap := range allSnapshots { - if strings.HasPrefix(snap, "snapshot-") { - snapshots = append(snapshots, strings.TrimPrefix(snap, "snapshot-")) - } - } - } - - targetStorageVolumeMountPoint := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) - fs := fmt.Sprintf("custom/%s", s.volume.Name) - - if source.VolumeOnly || len(snapshots) == 0 { - var err error - - if s.pool.Config["zfs.clone_copy"] != "" && !shared.IsTrue(s.pool.Config["zfs.clone_copy"]) { - err = s.copyVolumeWithoutSnapshotsFull(source) - } else { - err = s.copyVolumeWithoutSnapshotsSparse(source) - } - if err != nil { - return err - } - } else { - targetVolumeMountPoint := getStoragePoolVolumeMountPoint(poolName, s.volume.Name) - - err := os.MkdirAll(targetVolumeMountPoint, 0711) - if err != nil { - return err - } - - prev := "" - prevSnapOnlyName := "" - - for i, snap := range snapshots { - if i > 0 { - prev = snapshots[i-1] - } - - sourceDataset := fmt.Sprintf("%s/custom/%s at snapshot-%s", poolName, source.Name, snap) - targetDataset := fmt.Sprintf("%s/custom/%s at snapshot-%s", poolName, s.volume.Name, snap) - - snapshotMntPoint := getStoragePoolVolumeSnapshotMountPoint(poolName, fmt.Sprintf("%s/%s", s.volume.Name, snap)) - - err := os.MkdirAll(snapshotMntPoint, 0700) - if err != nil { - return err - } - - prevSnapOnlyName = snap - - args := []string{"send", sourceDataset} - - if prev != "" { - parentDataset := fmt.Sprintf("%s/custom/%s/snapshot-%s", poolName, source.Name, prev) - args = append(args, "-i", parentDataset) - } - - zfsSendCmd := exec.Command("zfs", args...) - zfsRecvCmd := exec.Command("zfs", "receive", "-F", targetDataset) - - zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() - zfsRecvCmd.Stdout = os.Stdout - zfsRecvCmd.Stderr = os.Stderr - - err = zfsRecvCmd.Start() - if err != nil { - return err - } - - err = zfsSendCmd.Run() - if err != nil { - return err - } - - err = zfsRecvCmd.Wait() - if err != nil { - return err - } - } - - tmpSnapshotName := fmt.Sprintf("copy-send-%s", uuid.NewRandom().String()) - - err = zfsPoolVolumeSnapshotCreate(poolName, fmt.Sprintf("custom/%s", source.Name), tmpSnapshotName) - if err != nil { - return err - } - - defer zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("custom/%s", source.Name), tmpSnapshotName) - - currentSnapshotDataset := fmt.Sprintf("%s/custom/%s@%s", poolName, source.Name, tmpSnapshotName) - - args := []string{"send", currentSnapshotDataset} - - if prevSnapOnlyName != "" { - args = append(args, "-i", fmt.Sprintf("%s/custom/%s at snapshot-%s", poolName, source.Name, prevSnapOnlyName)) - } - - zfsSendCmd := exec.Command("zfs", args...) - targetDataset := fmt.Sprintf("%s/custom/%s", poolName, s.volume.Name) - zfsRecvCmd := exec.Command("zfs", "receive", "-F", targetDataset) - - zfsRecvCmd.Stdin, _ = zfsSendCmd.StdoutPipe() - zfsRecvCmd.Stdout = os.Stdout - zfsRecvCmd.Stderr = os.Stderr - - err = zfsRecvCmd.Start() - if err != nil { - return err - } - - err = zfsSendCmd.Run() - if err != nil { - return err - } - - err = zfsRecvCmd.Wait() - if err != nil { - return err - } - - defer zfsPoolVolumeSnapshotDestroy(poolName, fmt.Sprintf("custom/%s", s.volume.Name), tmpSnapshotName) - - err = zfsPoolVolumeSet(poolName, fs, "canmount", "noauto") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(poolName, fs, "mountpoint", targetStorageVolumeMountPoint) - if err != nil { - return err - } - } - - if !shared.IsMountPoint(targetStorageVolumeMountPoint) { - err := zfsMount(poolName, fs) - if err != nil { - return err - } - defer zfsUmount(poolName, fs, targetStorageVolumeMountPoint) - } - - // apply quota - if s.volume.Config["size"] != "" { - size, err := shared.ParseByteSizeString(s.volume.Config["size"]) - if err != nil { - return err - } - - err = s.StorageEntitySetQuota(storagePoolVolumeTypeCustom, size, nil) - if err != nil { - return err - } - } - - logger.Infof(successMsg) - return nil -} - -func (s *zfsMigrationSourceDriver) SendStorageVolume(conn *websocket.Conn, op *operation, bwlimit string, storage storage, volumeOnly bool) error { - msg := fmt.Sprintf("Function not implemented") - logger.Errorf(msg) - return fmt.Errorf(msg) -} - -func (s *storageZfs) StorageMigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { - return rsyncStorageMigrationSource(args) -} - -func (s *storageZfs) StorageMigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { - return rsyncStorageMigrationSink(conn, op, args) -} - -func (s *storageZfs) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { - logger.Infof("Creating ZFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - sourceOnlyName, snapshotOnlyName, ok := containerGetParentAndSnapshotName(target.Name) - if !ok { - return fmt.Errorf("Not a snapshot name") - } - - sourceDataset := fmt.Sprintf("custom/%s", sourceOnlyName) - poolName := s.getOnDiskPoolName() - snapName := fmt.Sprintf("snapshot-%s", snapshotOnlyName) - err := zfsPoolVolumeSnapshotCreate(poolName, sourceDataset, snapName) - if err != nil { - return err - } - - snapshotMntPoint := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target.Name) - if !shared.PathExists(snapshotMntPoint) { - err := os.MkdirAll(snapshotMntPoint, 0700) - if err != nil { - return err - } - } - - logger.Infof("Created ZFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageZfs) StoragePoolVolumeSnapshotDelete() error { - logger.Infof("Deleting ZFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - sourceName, snapshotOnlyName, _ := containerGetParentAndSnapshotName(s.volume.Name) - snapshotName := fmt.Sprintf("snapshot-%s", snapshotOnlyName) - - onDiskPoolName := s.getOnDiskPoolName() - if zfsFilesystemEntityExists(onDiskPoolName, fmt.Sprintf("custom/%s@%s", sourceName, snapshotName)) { - removable, err := zfsPoolVolumeSnapshotRemovable(onDiskPoolName, fmt.Sprintf("custom/%s", sourceName), snapshotName) - if err != nil { - return err - } - - if removable { - err = zfsPoolVolumeSnapshotDestroy(onDiskPoolName, fmt.Sprintf("custom/%s", sourceName), snapshotName) - } else { - err = zfsPoolVolumeSnapshotRename(onDiskPoolName, fmt.Sprintf("custom/%s", sourceName), snapshotName, fmt.Sprintf("copy-%s", uuid.NewRandom().String())) - } - if err != nil { - return err - } - } - - storageVolumePath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) - err := os.RemoveAll(storageVolumePath) - if err != nil && !os.IsNotExist(err) { - return err - } - - storageVolumeSnapshotPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, sourceName) - empty, err := shared.PathIsEmpty(storageVolumeSnapshotPath) - if err == nil && empty { - os.RemoveAll(storageVolumeSnapshotPath) - } - - err = s.s.Cluster.StoragePoolVolumeDelete( - "default", - s.volume.Name, - storagePoolVolumeTypeCustom, - s.poolID) - if err != nil { - logger.Errorf(`Failed to delete database entry for DIR storage volume "%s" on storage pool "%s"`, - s.volume.Name, s.pool.Name) - } - - logger.Infof("Deleted ZFS storage volume snapshot \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageZfs) StoragePoolVolumeSnapshotRename(newName string) error { - logger.Infof("Renaming ZFS storage volume snapshot on storage pool \"%s\" from \"%s\" to \"%s\"", s.pool.Name, s.volume.Name, newName) - - sourceName, snapshotOnlyName, ok := containerGetParentAndSnapshotName(s.volume.Name) - if !ok { - return fmt.Errorf("Not a snapshot name") - } - - oldZfsDatasetName := fmt.Sprintf("snapshot-%s", snapshotOnlyName) - newZfsDatasetName := fmt.Sprintf("snapshot-%s", newName) - err := zfsPoolVolumeSnapshotRename(s.getOnDiskPoolName(), fmt.Sprintf("custom/%s", sourceName), oldZfsDatasetName, newZfsDatasetName) - if err != nil { - return err - } - - logger.Infof("Renamed ZFS storage volume snapshot on storage pool \"%s\" from \"%s\" to \"%s\"", s.pool.Name, s.volume.Name, newName) - - return s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, fmt.Sprintf("%s/%s", sourceName, newName), storagePoolVolumeTypeCustom, s.poolID) -} diff --git a/lxd/storage_zfs_utils.go b/lxd/storage_zfs_utils.go deleted file mode 100644 index a5f0eaaef6..0000000000 --- a/lxd/storage_zfs_utils.go +++ /dev/null @@ -1,833 +0,0 @@ -package main - -import ( - "fmt" - "io/ioutil" - "os" - "os/exec" - "path/filepath" - "strings" - "syscall" - "time" - - "github.com/lxc/lxd/shared" - "github.com/lxc/lxd/shared/logger" - - "github.com/pborman/uuid" -) - -// zfsIsEnabled returns whether zfs backend is supported. -func zfsIsEnabled() bool { - out, err := exec.LookPath("zfs") - if err != nil || len(out) == 0 { - return false - } - - return true -} - -// zfsToolVersionGet returns the ZFS tools version -func zfsToolVersionGet() (string, error) { - // This function is only really ever relevant on Ubuntu as the only - // distro that ships out of sync tools and kernel modules - out, err := shared.RunCommand("dpkg-query", "--showformat=${Version}", "--show", "zfsutils-linux") - if err != nil { - return "", err - } - - return strings.TrimSpace(string(out)), nil -} - -// zfsModuleVersionGet returns the ZFS module version -func zfsModuleVersionGet() (string, error) { - var zfsVersion string - - if shared.PathExists("/sys/module/zfs/version") { - out, err := ioutil.ReadFile("/sys/module/zfs/version") - if err != nil { - return "", fmt.Errorf("Could not determine ZFS module version") - } - - zfsVersion = string(out) - } else { - out, err := shared.RunCommand("modinfo", "-F", "version", "zfs") - if err != nil { - return "", fmt.Errorf("Could not determine ZFS module version") - } - - zfsVersion = out - } - - return strings.TrimSpace(zfsVersion), nil -} - -// zfsPoolVolumeCreate creates a ZFS dataset with a set of given properties. -func zfsPoolVolumeCreate(dataset string, properties ...string) (string, error) { - cmd := []string{"zfs", "create"} - - for _, prop := range properties { - cmd = append(cmd, []string{"-o", prop}...) - } - - cmd = append(cmd, []string{"-p", dataset}...) - - return shared.RunCommand(cmd[0], cmd[1:]...) -} - -func zfsPoolCheck(pool string) error { - output, err := shared.RunCommand( - "zfs", "get", "-H", "-o", "value", "type", pool) - if err != nil { - return fmt.Errorf(strings.Split(output, "\n")[0]) - } - - poolType := strings.Split(output, "\n")[0] - if poolType != "filesystem" { - return fmt.Errorf("Unsupported pool type: %s", poolType) - } - - return nil -} - -// zfsPoolVolumeExists verifies if a specific ZFS pool or volume exists. -func zfsPoolVolumeExists(dataset string) (bool, error) { - output, err := shared.RunCommand( - "zfs", "list", "-Ho", "name") - - if err != nil { - return false, err - } - - for _, name := range strings.Split(output, "\n") { - if name == dataset { - return true, nil - } - } - return false, nil -} - -func zfsPoolCreate(pool string, vdev string) error { - var output string - var err error - - dataset := "" - - if pool == "" { - output, err := shared.RunCommand( - "zfs", "create", "-p", "-o", "mountpoint=none", vdev) - if err != nil { - logger.Errorf("zfs create failed: %s", output) - return fmt.Errorf("Failed to create ZFS filesystem: %s", output) - } - dataset = vdev - } else { - output, err = shared.RunCommand( - "zpool", "create", "-f", "-m", "none", "-O", "compression=on", pool, vdev) - if err != nil { - logger.Errorf("zfs create failed: %s", output) - return fmt.Errorf("Failed to create the ZFS pool: %s", output) - } - - dataset = pool - } - - err = zfsPoolApplyDefaults(dataset) - if err != nil { - return err - } - - return nil -} - -func zfsPoolApplyDefaults(dataset string) error { - err := zfsPoolVolumeSet(dataset, "", "mountpoint", "none") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(dataset, "", "setuid", "on") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(dataset, "", "exec", "on") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(dataset, "", "devices", "on") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(dataset, "", "acltype", "posixacl") - if err != nil { - return err - } - - err = zfsPoolVolumeSet(dataset, "", "xattr", "sa") - if err != nil { - return err - } - - return nil -} - -func zfsPoolVolumeClone(project, pool string, source string, name string, dest string, mountpoint string) error { - output, err := shared.RunCommand( - "zfs", - "clone", - "-p", - "-o", fmt.Sprintf("mountpoint=%s", mountpoint), - "-o", "canmount=noauto", - fmt.Sprintf("%s/%s@%s", pool, source, name), - fmt.Sprintf("%s/%s", pool, dest)) - if err != nil { - logger.Errorf("zfs clone failed: %s", output) - return fmt.Errorf("Failed to clone the filesystem: %s", output) - } - - subvols, err := zfsPoolListSubvolumes(pool, fmt.Sprintf("%s/%s", pool, source)) - if err != nil { - return err - } - - for _, sub := range subvols { - snaps, err := zfsPoolListSnapshots(pool, sub) - if err != nil { - return err - } - - if !shared.StringInSlice(name, snaps) { - continue - } - - destSubvol := dest + strings.TrimPrefix(sub, source) - snapshotMntPoint := getSnapshotMountPoint(project, pool, destSubvol) - - output, err := shared.RunCommand( - "zfs", - "clone", - "-p", - "-o", fmt.Sprintf("mountpoint=%s", snapshotMntPoint), - "-o", "canmount=noauto", - fmt.Sprintf("%s/%s@%s", pool, sub, name), - fmt.Sprintf("%s/%s", pool, destSubvol)) - if err != nil { - logger.Errorf("zfs clone failed: %s", output) - return fmt.Errorf("Failed to clone the sub-volume: %s", output) - } - } - - return nil -} - -func zfsFilesystemEntityDelete(vdev string, pool string) error { - var output string - var err error - if strings.Contains(pool, "/") { - // Command to destroy a zfs dataset. - output, err = shared.RunCommand("zfs", "destroy", "-r", pool) - } else { - // Command to destroy a zfs pool. - output, err = shared.RunCommand("zpool", "destroy", "-f", pool) - } - if err != nil { - return fmt.Errorf("Failed to delete the ZFS pool: %s", output) - } - - // Cleanup storage - if filepath.IsAbs(vdev) && !shared.IsBlockdevPath(vdev) { - os.RemoveAll(vdev) - } - - return nil -} - -func zfsPoolVolumeDestroy(pool string, path string) error { - mountpoint, err := zfsFilesystemEntityPropertyGet(pool, path, "mountpoint") - if err != nil { - return err - } - - if mountpoint != "none" && shared.IsMountPoint(mountpoint) { - err := syscall.Unmount(mountpoint, syscall.MNT_DETACH) - if err != nil { - logger.Errorf("umount failed: %s", err) - return err - } - } - - // Due to open fds or kernel refs, this may fail for a bit, give it 10s - output, err := shared.TryRunCommand( - "zfs", - "destroy", - "-r", - fmt.Sprintf("%s/%s", pool, path)) - - if err != nil { - logger.Errorf("zfs destroy failed: %s", output) - return fmt.Errorf("Failed to destroy ZFS filesystem: %s", output) - } - - return nil -} - -func zfsPoolVolumeCleanup(pool string, path string) error { - if strings.HasPrefix(path, "deleted/") { - // Cleanup of filesystems kept for refcount reason - removablePath, err := zfsPoolVolumeSnapshotRemovable(pool, path, "") - if err != nil { - return err - } - - // Confirm that there are no more clones - if removablePath { - if strings.Contains(path, "@") { - // Cleanup snapshots - err = zfsPoolVolumeDestroy(pool, path) - if err != nil { - return err - } - - // Check if the parent can now be deleted - subPath := strings.SplitN(path, "@", 2)[0] - snaps, err := zfsPoolListSnapshots(pool, subPath) - if err != nil { - return err - } - - if len(snaps) == 0 { - err := zfsPoolVolumeCleanup(pool, subPath) - if err != nil { - return err - } - } - } else { - // Cleanup filesystems - origin, err := zfsFilesystemEntityPropertyGet(pool, path, "origin") - if err != nil { - return err - } - origin = strings.TrimPrefix(origin, fmt.Sprintf("%s/", pool)) - - err = zfsPoolVolumeDestroy(pool, path) - if err != nil { - return err - } - - // Attempt to remove its parent - if origin != "-" { - err := zfsPoolVolumeCleanup(pool, origin) - if err != nil { - return err - } - } - } - - return nil - } - } else if strings.HasPrefix(path, "containers") && strings.Contains(path, "@copy-") { - // Just remove the copy- snapshot for copies of active containers - err := zfsPoolVolumeDestroy(pool, path) - if err != nil { - return err - } - } - - return nil -} - -func zfsFilesystemEntityPropertyGet(pool string, path string, key string) (string, error) { - entity := pool - if path != "" { - entity = fmt.Sprintf("%s/%s", pool, path) - } - output, err := shared.RunCommand( - "zfs", - "get", - "-H", - "-p", - "-o", "value", - key, - entity) - if err != nil { - return "", fmt.Errorf("Failed to get ZFS config: %s", output) - } - - return strings.TrimRight(output, "\n"), nil -} - -func zfsPoolVolumeRename(pool string, source string, dest string, ignoreMounts bool) error { - var err error - var output string - - for i := 0; i < 20; i++ { - if ignoreMounts { - output, err = shared.RunCommand( - "/proc/self/exe", - "forkzfs", - "--", - "rename", - "-p", - fmt.Sprintf("%s/%s", pool, source), - fmt.Sprintf("%s/%s", pool, dest)) - } else { - output, err = shared.RunCommand( - "zfs", - "rename", - "-p", - fmt.Sprintf("%s/%s", pool, source), - fmt.Sprintf("%s/%s", pool, dest)) - } - - // Success - if err == nil { - return nil - } - - // zfs rename can fail because of descendants, yet still manage the rename - if !zfsFilesystemEntityExists(pool, source) && zfsFilesystemEntityExists(pool, dest) { - return nil - } - - time.Sleep(500 * time.Millisecond) - } - - // Timeout - logger.Errorf("zfs rename failed: %s", output) - return fmt.Errorf("Failed to rename ZFS filesystem: %s", output) -} - -func zfsPoolVolumeSet(pool string, path string, key string, value string) error { - vdev := pool - if path != "" { - vdev = fmt.Sprintf("%s/%s", pool, path) - } - output, err := shared.RunCommand( - "zfs", - "set", - fmt.Sprintf("%s=%s", key, value), - vdev) - if err != nil { - logger.Errorf("zfs set failed: %s", output) - return fmt.Errorf("Failed to set ZFS config: %s", output) - } - - return nil -} - -func zfsPoolVolumeSnapshotCreate(pool string, path string, name string) error { - output, err := shared.RunCommand( - "zfs", - "snapshot", - "-r", - fmt.Sprintf("%s/%s@%s", pool, path, name)) - if err != nil { - logger.Errorf("zfs snapshot failed: %s", output) - return fmt.Errorf("Failed to create ZFS snapshot: %s", output) - } - - return nil -} - -func zfsPoolVolumeSnapshotDestroy(pool, path string, name string) error { - output, err := shared.RunCommand( - "zfs", - "destroy", - "-r", - fmt.Sprintf("%s/%s@%s", pool, path, name)) - if err != nil { - logger.Errorf("zfs destroy failed: %s", output) - return fmt.Errorf("Failed to destroy ZFS snapshot: %s", output) - } - - return nil -} - -func zfsPoolVolumeSnapshotRestore(pool string, path string, name string) error { - output, err := shared.TryRunCommand( - "zfs", - "rollback", - fmt.Sprintf("%s/%s@%s", pool, path, name)) - if err != nil { - logger.Errorf("zfs rollback failed: %s", output) - return fmt.Errorf("Failed to restore ZFS snapshot: %s", output) - } - - subvols, err := zfsPoolListSubvolumes(pool, fmt.Sprintf("%s/%s", pool, path)) - if err != nil { - return err - } - - for _, sub := range subvols { - snaps, err := zfsPoolListSnapshots(pool, sub) - if err != nil { - return err - } - - if !shared.StringInSlice(name, snaps) { - continue - } - - output, err := shared.TryRunCommand( - "zfs", - "rollback", - fmt.Sprintf("%s/%s@%s", pool, sub, name)) - if err != nil { - logger.Errorf("zfs rollback failed: %s", output) - return fmt.Errorf("Failed to restore ZFS sub-volume snapshot: %s", output) - } - } - - return nil -} - -func zfsPoolVolumeSnapshotRename(pool string, path string, oldName string, newName string) error { - output, err := shared.RunCommand( - "zfs", - "rename", - "-r", - fmt.Sprintf("%s/%s@%s", pool, path, oldName), - fmt.Sprintf("%s/%s@%s", pool, path, newName)) - if err != nil { - logger.Errorf("zfs snapshot rename failed: %s", output) - return fmt.Errorf("Failed to rename ZFS snapshot: %s", output) - } - - return nil -} - -func zfsMount(poolName string, path string) error { - output, err := shared.TryRunCommand( - "zfs", - "mount", - fmt.Sprintf("%s/%s", poolName, path)) - if err != nil { - return fmt.Errorf("Failed to mount ZFS filesystem: %s", output) - } - - return nil -} - -func zfsUmount(poolName string, path string, mountpoint string) error { - output, err := shared.TryRunCommand( - "zfs", - "unmount", - fmt.Sprintf("%s/%s", poolName, path)) - if err != nil { - logger.Warnf("Failed to unmount ZFS filesystem via zfs unmount: %s. Trying lazy umount (MNT_DETACH)...", output) - err := tryUnmount(mountpoint, syscall.MNT_DETACH) - if err != nil { - logger.Warnf("Failed to unmount ZFS filesystem via lazy umount (MNT_DETACH)...") - return err - } - } - - return nil -} - -func zfsPoolListSubvolumes(pool string, path string) ([]string, error) { - output, err := shared.RunCommand( - "zfs", - "list", - "-t", "filesystem", - "-o", "name", - "-H", - "-r", path) - if err != nil { - logger.Errorf("zfs list failed: %s", output) - return []string{}, fmt.Errorf("Failed to list ZFS filesystems: %s", output) - } - - children := []string{} - for _, entry := range strings.Split(output, "\n") { - if entry == "" { - continue - } - - if entry == path { - continue - } - - children = append(children, strings.TrimPrefix(entry, fmt.Sprintf("%s/", pool))) - } - - return children, nil -} - -func zfsPoolListSnapshots(pool string, path string) ([]string, error) { - path = strings.TrimRight(path, "/") - fullPath := pool - if path != "" { - fullPath = fmt.Sprintf("%s/%s", pool, path) - } - - output, err := shared.RunCommand( - "zfs", - "list", - "-t", "snapshot", - "-o", "name", - "-H", - "-d", "1", - "-s", "creation", - "-r", fullPath) - if err != nil { - logger.Errorf("zfs list failed: %s", output) - return []string{}, fmt.Errorf("Failed to list ZFS snapshots: %s", output) - } - - children := []string{} - for _, entry := range strings.Split(output, "\n") { - if entry == "" { - continue - } - - if entry == fullPath { - continue - } - - children = append(children, strings.SplitN(entry, "@", 2)[1]) - } - - return children, nil -} - -func zfsPoolVolumeSnapshotRemovable(pool string, path string, name string) (bool, error) { - var snap string - if name == "" { - snap = path - } else { - snap = fmt.Sprintf("%s@%s", path, name) - } - - clones, err := zfsFilesystemEntityPropertyGet(pool, snap, "clones") - if err != nil { - return false, err - } - - if clones == "-" || clones == "" { - return true, nil - } - - return false, nil -} - -func zfsFilesystemEntityExists(pool string, path string) bool { - vdev := pool - if path != "" { - vdev = fmt.Sprintf("%s/%s", pool, path) - } - output, err := shared.RunCommand( - "zfs", - "get", - "-H", - "-o", - "name", - "type", - vdev) - if err != nil { - return false - } - - detectedName := strings.TrimSpace(output) - return detectedName == vdev -} - -func (s *storageZfs) doContainerMount(project, name string, privileged bool) (bool, error) { - logger.Debugf("Mounting ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - volumeName := projectPrefix(project, name) - fs := fmt.Sprintf("containers/%s", volumeName) - containerPoolVolumeMntPoint := getContainerMountPoint(project, s.pool.Name, name) - - containerMountLockID := getContainerMountLockID(s.pool.Name, name) - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[containerMountLockID]; ok { - lxdStorageMapLock.Unlock() - if _, ok := <-waitChannel; ok { - logger.Warnf("Received value over semaphore, this should not have happened") - } - // Give the benefit of the doubt and assume that the other - // thread actually succeeded in mounting the storage volume. - return false, nil - } - - lxdStorageOngoingOperationMap[containerMountLockID] = make(chan bool) - lxdStorageMapLock.Unlock() - - removeLockFromMap := func() { - lxdStorageMapLock.Lock() - if waitChannel, ok := lxdStorageOngoingOperationMap[containerMountLockID]; ok { - close(waitChannel) - delete(lxdStorageOngoingOperationMap, containerMountLockID) - } - lxdStorageMapLock.Unlock() - } - - defer removeLockFromMap() - - // Since we're using mount() directly zfs will not automatically create - // the mountpoint for us. So let's check and do it if needed. - if !shared.PathExists(containerPoolVolumeMntPoint) { - err := createContainerMountpoint(containerPoolVolumeMntPoint, shared.VarPath(fs), privileged) - if err != nil { - return false, err - } - } - - ourMount := false - if !shared.IsMountPoint(containerPoolVolumeMntPoint) { - source := fmt.Sprintf("%s/%s", s.getOnDiskPoolName(), fs) - zfsMountOptions := fmt.Sprintf("rw,zfsutil,mntpoint=%s", containerPoolVolumeMntPoint) - mounterr := tryMount(source, containerPoolVolumeMntPoint, "zfs", 0, zfsMountOptions) - if mounterr != nil { - if mounterr != syscall.EBUSY { - logger.Errorf("Failed to mount ZFS dataset \"%s\" onto \"%s\": %v", source, containerPoolVolumeMntPoint, mounterr) - return false, mounterr - } - // EBUSY error in zfs are related to a bug we're - // tracking. So ignore them for now, report back that - // the mount isn't ours and proceed. - logger.Warnf("ZFS returned EBUSY while \"%s\" is actually not a mountpoint", containerPoolVolumeMntPoint) - return false, mounterr - } - ourMount = true - } - - logger.Debugf("Mounted ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return ourMount, nil -} - -func (s *storageZfs) doContainerDelete(project, name string) error { - logger.Debugf("Deleting ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - poolName := s.getOnDiskPoolName() - containerName := name - fs := fmt.Sprintf("containers/%s", projectPrefix(project, containerName)) - containerPoolVolumeMntPoint := getContainerMountPoint(project, s.pool.Name, containerName) - - if zfsFilesystemEntityExists(poolName, fs) { - removable := true - snaps, err := zfsPoolListSnapshots(poolName, fs) - if err != nil { - return err - } - - for _, snap := range snaps { - var err error - removable, err = zfsPoolVolumeSnapshotRemovable(poolName, fs, snap) - if err != nil { - return err - } - - if !removable { - break - } - } - - if removable { - origin, err := zfsFilesystemEntityPropertyGet(poolName, fs, "origin") - if err != nil { - return err - } - poolName := s.getOnDiskPoolName() - origin = strings.TrimPrefix(origin, fmt.Sprintf("%s/", poolName)) - - err = zfsPoolVolumeDestroy(poolName, fs) - if err != nil { - return err - } - - err = zfsPoolVolumeCleanup(poolName, origin) - if err != nil { - return err - } - } else { - err := zfsPoolVolumeSet(poolName, fs, "mountpoint", "none") - if err != nil { - return err - } - - err = zfsPoolVolumeRename(poolName, fs, fmt.Sprintf("deleted/containers/%s", uuid.NewRandom().String()), true) - if err != nil { - return err - } - } - } - - err := deleteContainerMountpoint(containerPoolVolumeMntPoint, shared.VarPath("containers", projectPrefix(project, name)), s.GetStorageTypeName()) - if err != nil { - return err - } - - snapshotZfsDataset := fmt.Sprintf("snapshots/%s", containerName) - zfsPoolVolumeDestroy(poolName, snapshotZfsDataset) - - // Delete potential leftover snapshot mountpoints. - snapshotMntPoint := getSnapshotMountPoint(project, s.pool.Name, containerName) - if shared.PathExists(snapshotMntPoint) { - err := os.RemoveAll(snapshotMntPoint) - if err != nil { - return err - } - } - - // Delete potential leftover snapshot symlinks: - // ${LXD_DIR}/snapshots/ to ${POOL}/snapshots/ - snapshotSymlink := shared.VarPath("snapshots", projectPrefix(project, containerName)) - if shared.PathExists(snapshotSymlink) { - err := os.Remove(snapshotSymlink) - if err != nil { - return err - } - } - - logger.Debugf("Deleted ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func (s *storageZfs) doContainerCreate(project, name string, privileged bool) error { - logger.Debugf("Creating empty ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - - containerPath := shared.VarPath("containers", projectPrefix(project, name)) - containerName := name - fs := fmt.Sprintf("containers/%s", projectPrefix(project, containerName)) - poolName := s.getOnDiskPoolName() - dataset := fmt.Sprintf("%s/%s", poolName, fs) - containerPoolVolumeMntPoint := getContainerMountPoint(project, s.pool.Name, containerName) - - // Create volume. - msg, err := zfsPoolVolumeCreate(dataset, "mountpoint=none", "canmount=noauto") - if err != nil { - logger.Errorf("Failed to create ZFS storage volume for container \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, msg) - return err - } - - // Set mountpoint. - err = zfsPoolVolumeSet(poolName, fs, "mountpoint", containerPoolVolumeMntPoint) - if err != nil { - return err - } - - err = createContainerMountpoint(containerPoolVolumeMntPoint, containerPath, privileged) - if err != nil { - return err - } - - logger.Debugf("Created empty ZFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) - return nil -} - -func zfsIdmapSetSkipper(dir string, absPath string, fi os.FileInfo) bool { - strippedPath := absPath - if dir != "" { - strippedPath = absPath[len(dir):] - } - - if fi.IsDir() && strippedPath == "/.zfs/snapshot" { - return true - } - - return false -} From lxc-bot at linuxcontainers.org Thu May 2 15:17:38 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Thu, 02 May 2019 08:17:38 -0700 (PDT) Subject: [lxc-devel] [lxc/master] seccomp: send process memory fd Message-ID: <5ccb0a12.1c69fb81.ee1a9.b883SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 584 bytes Desc: not available URL: -------------- next part -------------- From 5ed06d3ad6a80a7a8efd10a9c01e90e0e7981306 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 2 May 2019 17:06:00 +0200 Subject: [PATCH] seccomp: send process memory fd There's an inherent race when reading a process's memory. The easiest way is to have liblxc get an fd and check that the race was one, send it to the caller (They are free to ignore it if they don't use recvmsg()). Signed-off-by: Christian Brauner --- src/lxc/af_unix.c | 6 ++++++ src/lxc/af_unix.h | 2 ++ src/lxc/seccomp.c | 27 +++++++++++++++++++++++++-- 3 files changed, 33 insertions(+), 2 deletions(-) diff --git a/src/lxc/af_unix.c b/src/lxc/af_unix.c index 7f0711ed22..9e2f8587c8 100644 --- a/src/lxc/af_unix.c +++ b/src/lxc/af_unix.c @@ -199,6 +199,12 @@ int lxc_abstract_unix_send_fds(int fd, int *sendfds, int num_sendfds, return ret; } +int lxc_unix_send_fds(int fd, int *sendfds, int num_sendfds, void *data, + size_t size) +{ + return lxc_abstract_unix_send_fds(fd, sendfds, num_sendfds, data, size); +} + int lxc_abstract_unix_recv_fds(int fd, int *recvfds, int num_recvfds, void *data, size_t size) { diff --git a/src/lxc/af_unix.h b/src/lxc/af_unix.h index 3ae5954983..8a068d920f 100644 --- a/src/lxc/af_unix.h +++ b/src/lxc/af_unix.h @@ -35,6 +35,8 @@ extern void lxc_abstract_unix_close(int fd); extern int lxc_abstract_unix_connect(const char *path); extern int lxc_abstract_unix_send_fds(int fd, int *sendfds, int num_sendfds, void *data, size_t size); +extern int lxc_unix_send_fds(int fd, int *sendfds, int num_sendfds, void *data, + size_t size); extern int lxc_abstract_unix_recv_fds(int fd, int *recvfds, int num_recvfds, void *data, size_t size); extern int lxc_abstract_unix_send_credential(int fd, void *data, size_t size); diff --git a/src/lxc/seccomp.c b/src/lxc/seccomp.c index f326c8d307..bfbc19ac53 100644 --- a/src/lxc/seccomp.c +++ b/src/lxc/seccomp.c @@ -1335,8 +1335,10 @@ int seccomp_notify_handler(int fd, uint32_t events, void *data, { #if HAVE_DECL_SECCOMP_NOTIF_GET_FD + __do_close_prot_errno int fd_mem = -EBADF; int reconnect_count, ret; ssize_t bytes; + char mem_path[6 + 21 + 5]; struct lxc_handler *hdlr = data; struct lxc_conf *conf = hdlr->conf; struct seccomp_notif *req = conf->seccomp.notifier.req_buf; @@ -1355,14 +1357,33 @@ int seccomp_notify_handler(int fd, uint32_t events, void *data, goto out; } + snprintf(mem_path, sizeof(mem_path), "/proc/%d/mem", req->pid); + fd_mem = open(mem_path, O_RDONLY | O_CLOEXEC); + if (fd_mem < 0) { + (void)seccomp_notify_default_answer(fd, req, resp, hdlr); + SYSERROR("Failed to open process memory for seccomp notify request"); + goto out; + } + + /* + * Make sure that the fd for /proc//mem we just opened still + * refers to the correct process's memory. + */ + ret = seccomp_notif_id_valid(fd, req->id); + if (ret < 0) { + (void)seccomp_notify_default_answer(fd, req, resp, hdlr); + SYSERROR("Invalid seccomp notify request id"); + goto out; + } + memcpy(&msg.req, req, sizeof(msg.req)); msg.monitor_pid = hdlr->monitor_pid; msg.init_pid = hdlr->pid; reconnect_count = 0; do { - bytes = lxc_send_nointr(listener_proxy_fd, &msg, sizeof(msg), - MSG_NOSIGNAL); + bytes = lxc_unix_send_fds(listener_proxy_fd, &fd_mem, 1, &msg, + sizeof(msg)); if (bytes != (ssize_t)sizeof(msg)) { SYSERROR("Failed to forward message to seccomp proxy"); if (seccomp_notify_default_answer(fd, req, resp, hdlr)) @@ -1370,6 +1391,8 @@ int seccomp_notify_handler(int fd, uint32_t events, void *data, } } while (reconnect_count++); + close_prot_errno_disarm(fd_mem); + reconnect_count = 0; do { bytes = lxc_recv_nointr(listener_proxy_fd, &msg, sizeof(msg), 0); From lxc-bot at linuxcontainers.org Thu May 2 19:46:42 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Thu, 02 May 2019 12:46:42 -0700 (PDT) Subject: [lxc-devel] [lxd/master] seccomp: add support for SECCOMP_RET_NOTIF_USER Message-ID: <5ccb4922.1c69fb81.d4927.91fcSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 364 bytes Desc: not available URL: -------------- next part -------------- From f3d0ee28cd95c53e788e781b8a9fc4e1a601a4ac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Mon, 29 Apr 2019 17:13:24 -0400 Subject: [PATCH 1/2] lxd/seccomp: Minimal seccomp server MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/daemon.go | 11 +++++++ lxd/seccomp.go | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff --git a/lxd/daemon.go b/lxd/daemon.go index e7a1b4eba6..3f9d572a7d 100644 --- a/lxd/daemon.go +++ b/lxd/daemon.go @@ -68,6 +68,7 @@ type Daemon struct { config *DaemonConfig endpoints *endpoints.Endpoints gateway *cluster.Gateway + seccomp *SeccompServer proxy func(req *http.Request) (*url.URL, error) @@ -857,6 +858,16 @@ func (d *Daemon) init() error { deviceInotifyDirRescan(d.State()) go deviceInotifyHandler(d.State()) + // Setup seccomp handler + if d.os.SeccompListener { + seccompServer, err := NewSeccompServer(d, shared.VarPath("seccomp.socket")) + if err != nil { + return err + } + d.seccomp = seccompServer + logger.Info("Started seccomp handler", log.Ctx{"path": shared.VarPath("seccomp.socket")}) + } + // Read the trusted certificates readSavedClientCAList(d) diff --git a/lxd/seccomp.go b/lxd/seccomp.go index 391d8b1097..90e5ad4296 100644 --- a/lxd/seccomp.go +++ b/lxd/seccomp.go @@ -3,10 +3,12 @@ package main import ( "fmt" "io/ioutil" + "net" "os" "path" "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/logger" "github.com/lxc/lxd/shared/osarch" ) @@ -166,3 +168,81 @@ func SeccompDeleteProfile(c container) { */ os.Remove(SeccompProfilePath(c)) } + +type SeccompServer struct { + d *Daemon + path string + l net.Listener +} + +func NewSeccompServer(d *Daemon, path string) (*SeccompServer, error) { + // Cleanup existing sockets + if shared.PathExists(path) { + err := os.Remove(path) + if err != nil { + return nil, err + } + } + + // Bind new socket + l, err := net.Listen("unix", path) + if err != nil { + return nil, err + } + + // Restrict access + err = os.Chmod(path, 0700) + if err != nil { + return nil, err + } + + // Start the server + s := SeccompServer{ + d: d, + path: path, + l: l, + } + + go func() { + for { + c, err := l.Accept() + if err != nil { + return + } + + go func() { + ucred, err := getCred(c.(*net.UnixConn)) + if err != nil { + logger.Errorf("Unable to get ucred from seccomp socket client: %v", err) + return + } + logger.Debugf("Connected to seccomp socket: pid=%v", ucred.pid) + + for { + buf := make([]byte, 4096) + _, err := c.Read(buf) + if err != nil { + logger.Debugf("Disconnected from seccomp socket: pid=%v", ucred.pid) + c.Close() + return + } + + // Unpack the struct here and pass unpacked struct to handler + go s.Handler(c, ucred, buf) + } + }() + } + }() + + return &s, nil +} + +func (s *SeccompServer) Handler(c net.Conn, ucred *ucred, buf []byte) error { + logger.Debugf("Handling seccomp notification from: %v", ucred.pid) + return nil +} + +func (s *SeccompServer) Stop() error { + os.Remove(s.path) + return s.l.Close() +} From b5600e43fa1991087cb0b2b65c33d792de509b75 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 1 May 2019 18:26:41 +0200 Subject: [PATCH 2/2] seccomp: implement notifier structure unpacking and notifier responses Signed-off-by: Christian Brauner --- lxd/container_lxc.go | 7 ++ lxd/main.go | 4 + lxd/main_forkmknod.go | 191 +++++++++++++++++++++++++++++++++++++++ lxd/main_nsexec.go | 3 + lxd/seccomp.go | 132 +++++++++++++++++++++++++-- shared/util_linux_cgo.go | 53 ++++++++--- 6 files changed, 371 insertions(+), 19 deletions(-) create mode 100644 lxd/main_forkmknod.go diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 68b238e9c6..11f4118015 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -1811,6 +1811,13 @@ func (c *containerLXC) initLXC(config bool) error { return err } + if lxc.HasApiExtension("seccomp_notify") && c.DaemonState().OS.SeccompListener { + err = lxcSetConfigItem(cc, "lxc.seccomp.notify.proxy", fmt.Sprintf("unix:%s", shared.VarPath("seccomp.socket"))) + if err != nil { + return err + } + } + // Apply raw.lxc if lxcConfig, ok := c.expandedConfig["raw.lxc"]; ok { f, err := ioutil.TempFile("", "lxd_config_") diff --git a/lxd/main.go b/lxd/main.go index 79d94fbe0e..66e25e7455 100644 --- a/lxd/main.go +++ b/lxd/main.go @@ -108,6 +108,10 @@ func main() { forkmigrateCmd := cmdForkmigrate{global: &globalCmd} app.AddCommand(forkmigrateCmd.Command()) + // forkmknod sub-command + forkmknodCmd := cmdForkmknod{global: &globalCmd} + app.AddCommand(forkmknodCmd.Command()) + // forkmount sub-command forkmountCmd := cmdForkmount{global: &globalCmd} app.AddCommand(forkmountCmd.Command()) diff --git a/lxd/main_forkmknod.go b/lxd/main_forkmknod.go new file mode 100644 index 0000000000..22cd4ef5ad --- /dev/null +++ b/lxd/main_forkmknod.go @@ -0,0 +1,191 @@ +package main + +import ( + "fmt" + + "github.com/spf13/cobra" +) + +/* +#ifndef _GNU_SOURCE +#define _GNU_SOURCE 1 +#endif +#include +#include +#include +#include +#include +#include +#include + +#include "include/memory_utils.h" + +extern char* advance_arg(bool required); +extern int dosetns(int pid, char *nstype); + +static uid_t get_root_uid(pid_t pid) +{ + char *line = NULL; + size_t sz = 0; + uid_t nsid, hostid, range; + FILE *f; + char path[256]; + + snprintf(path, sizeof(path), "/proc/%d/uid_map", pid); + f = fopen(path, "re"); + if (!f) + return -1; + + while (getline(&line, &sz, f) != -1) { + if (sscanf(line, "%u %u %u", &nsid, &hostid, &range) != 3) + continue; + + if (nsid == 0) + return hostid; + } + + nsid = -1; + +found: + fclose(f); + free(line); + return nsid; +} + +static gid_t get_root_gid(pid_t pid) +{ + char *line = NULL; + size_t sz = 0; + gid_t nsid, hostid, range; + FILE *f; + char path[256]; + + snprintf(path, sizeof(path), "/proc/%d/gid_map", pid); + f = fopen(path, "re"); + if (!f) + return -1; + + while (getline(&line, &sz, f) != -1) { + if (sscanf(line, "%u %u %u", &nsid, &hostid, &range) != 3) + continue; + + if (nsid == 0) + return hostid; + } + + nsid = -1; + +found: + fclose(f); + free(line); + return nsid; +} + +// Expects command line to be in the form: +// +void forkmknod() +{ + ssize_t bytes = 0; + char *cur = NULL; + char *path = NULL; + mode_t mode = 0; + dev_t dev = 0; + pid_t pid = 0; + uid_t uid = -1; + gid_t gid = -1; + char cwd[256], cwd_path[PATH_MAX]; + + // Get the subcommand + cur = advance_arg(false); + if (!cur || + (strcmp(cur, "--help") == 0 || + strcmp(cur, "--version") == 0 || strcmp(cur, "-h") == 0)) + return; + + // Check that we're root + if (geteuid() != 0) { + fprintf(stderr, "Error: forkmknod requires root privileges\n"); + _exit(EXIT_FAILURE); + } + + // Get the container PID + pid = atoi(cur); + + // path to create + path = advance_arg(true); + if (!path) + _exit(EXIT_FAILURE); + + mode = atoi(advance_arg(true)); + dev = atoi(advance_arg(true)); + + snprintf(cwd, sizeof(cwd), "/proc/%d/cwd", pid); + bytes = readlink(cwd, cwd_path, sizeof(cwd_path)); + if (bytes < 0 || bytes >= sizeof(cwd_path)) { + fprintf(stderr, "Failed to retrieve cwd of target process: %s\n", + strerror(errno)); + _exit(EXIT_FAILURE); + } + cwd_path[bytes] = '\0'; + + uid = get_root_uid(pid); + if (uid < 0) + fprintf(stderr, "No root uid found (%d)\n", uid); + + gid = get_root_gid(pid); + if (gid < 0) + fprintf(stderr, "No root gid found (%d)\n", gid); + + snprintf(cwd, sizeof(cwd), "/proc/%d/root", pid); + if (chroot(cwd)) { + fprintf(stderr, "Failed to chroot to container rootfs: %s\n", + strerror(errno)); + _exit(EXIT_FAILURE); + } + + if (chdir(cwd_path)) { + fprintf(stderr, "Failed to change to target process cwd: %s\n", + strerror(errno)); + _exit(EXIT_FAILURE); + } + + if (mknod(path, mode, dev)) { + fprintf(stderr, "Failed to create device %s\n", strerror(errno)); + _exit(EXIT_FAILURE); + } + + if (chown(path, uid, gid)) { + fprintf(stderr, "Failed to chown device to container root %s\n", + strerror(errno)); + _exit(EXIT_FAILURE); + } + + _exit(EXIT_SUCCESS); +} +*/ +import "C" + +type cmdForkmknod struct { + global *cmdGlobal +} + +func (c *cmdForkmknod) Command() *cobra.Command { + // Main subcommand + cmd := &cobra.Command{} + cmd.Use = "forkmknod " + cmd.Short = "Perform mknod operations" + cmd.Long = `Description: + Perform mknod operations + + This set of internal commands are used for all seccom-based container mknod + operations. +` + cmd.RunE = c.Run + cmd.Hidden = true + + return cmd +} + +func (c *cmdForkmknod) Run(cmd *cobra.Command, args []string) error { + return fmt.Errorf("This command should have been intercepted in cgo") +} diff --git a/lxd/main_nsexec.go b/lxd/main_nsexec.go index 18e38d1f33..8fa6db708a 100644 --- a/lxd/main_nsexec.go +++ b/lxd/main_nsexec.go @@ -38,6 +38,7 @@ package main // External functions extern void checkfeature(); extern void forkfile(); +extern void forkmknod(); extern void forkmount(); extern void forknet(); extern void forkproxy(); @@ -265,6 +266,8 @@ __attribute__((constructor)) void init(void) { // Intercepts some subcommands if (strcmp(cmdline_cur, "forkfile") == 0) forkfile(); + else if (strcmp(cmdline_cur, "forkmknod") == 0) + forkmknod(); else if (strcmp(cmdline_cur, "forkmount") == 0) forkmount(); else if (strcmp(cmdline_cur, "forknet") == 0) diff --git a/lxd/seccomp.go b/lxd/seccomp.go index 90e5ad4296..f37e701041 100644 --- a/lxd/seccomp.go +++ b/lxd/seccomp.go @@ -1,17 +1,88 @@ +// +build cgo package main import ( + "bytes" "fmt" + "io" "io/ioutil" "net" "os" "path" + "unsafe" + "golang.org/x/sys/unix" + + "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/logger" "github.com/lxc/lxd/shared/osarch" ) +/* +#ifndef _GNU_SOURCE +#define _GNU_SOURCE 1 +#endif +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct seccomp_notify_proxy_msg { + uint32_t version; + struct seccomp_notif req; + struct seccomp_notif_resp resp; + pid_t monitor_pid; + pid_t init_pid; +}; + +#define SECCOMP_PROXY_MSG_SIZE (sizeof(struct seccomp_notify_proxy_msg)) + +static int device_allowed(dev_t dev, mode_t mode) +{ + if ((dev == makedev(5, 1)) && (mode & S_IFCHR)) // /dev/console + return 0; + + return -EPERM; +} + +static int seccomp_notify_mknod_set_response(struct seccomp_notify_proxy_msg *msg) +{ + struct seccomp_notif *req = &msg->req; + struct seccomp_notif_resp *resp = &msg->resp; + int ret; + + resp->id = req->id; + resp->flags = req->flags; + resp->val = 0; + + if (req->data.nr != __NR_mknod) { + resp->error = -ENOSYS; + return -1; + } + + resp->error = device_allowed(req->data.args[2], req->data.args[1]); + if (resp->error) + return -1; + + return 0; +} +*/ +// #cgo CFLAGS: -std=gnu11 -Wvla +// #cgo LDFLAGS: -lseccomp +import "C" + const SECCOMP_HEADER = `2 ` @@ -22,6 +93,7 @@ open_by_handle_at errno 38 init_module errno 38 finit_module errno 38 delete_module errno 38 +mknod notify ` const COMPAT_BLOCKING_POLICY = `[%s] compat_sys_rt_sigaction errno 38 @@ -216,19 +288,24 @@ func NewSeccompServer(d *Daemon, path string) (*SeccompServer, error) { logger.Errorf("Unable to get ucred from seccomp socket client: %v", err) return } + logger.Debugf("Connected to seccomp socket: pid=%v", ucred.pid) + unixFile, err := c.(*net.UnixConn).File() + if err != nil { + return + } + for { - buf := make([]byte, 4096) - _, err := c.Read(buf) - if err != nil { - logger.Debugf("Disconnected from seccomp socket: pid=%v", ucred.pid) + buf := make([]byte, C.SECCOMP_PROXY_MSG_SIZE) + fdMem, err := shared.AbstractUnixReceiveFdData(int(unixFile.Fd()), buf) + if err != nil || err == io.EOF { + logger.Debugf("Disconnected from seccomp socket after receive: pid=%v", ucred.pid) c.Close() return } - // Unpack the struct here and pass unpacked struct to handler - go s.Handler(c, ucred, buf) + go s.Handler(c, ucred, buf, fdMem) } }() } @@ -237,8 +314,49 @@ func NewSeccompServer(d *Daemon, path string) (*SeccompServer, error) { return &s, nil } -func (s *SeccompServer) Handler(c net.Conn, ucred *ucred, buf []byte) error { +func (s *SeccompServer) Handler(c net.Conn, ucred *ucred, buf []byte, fdMem int) error { logger.Debugf("Handling seccomp notification from: %v", ucred.pid) + pathBuf := make([]byte, unix.PathMax) + + defer unix.Close(fdMem) + var msg C.struct_seccomp_notify_proxy_msg + C.memcpy(unsafe.Pointer(&msg), unsafe.Pointer(&buf[0]), C.SECCOMP_PROXY_MSG_SIZE) + + // We're ignoring the return value for now but we'll need it later. + ret := C.seccomp_notify_mknod_set_response(&msg) + if ret == 0 { + _, err := unix.Pread(fdMem, pathBuf, int64(msg.req.data.args[0])) + if err != nil { + goto out + } + + idx := bytes.IndexRune(pathBuf, 0) + path := string(pathBuf[:idx]) + mode := int32(msg.req.data.args[1]) + dev := uint32(msg.req.data.args[2]) + // Expects command line to be in the form: + _, err = shared.RunCommand(util.GetExecPath(), + "forkmknod", + fmt.Sprintf("%d", msg.req.pid), + path, + fmt.Sprintf("%d", mode), + fmt.Sprintf("%d", dev)) + if err != nil { + logger.Errorf("Failed to create device node: %s", err) + msg.resp.error = -C.EPERM + } + } + + C.memcpy(unsafe.Pointer(&buf[0]), unsafe.Pointer(&msg), C.SECCOMP_PROXY_MSG_SIZE) + +out: + _, err := c.Write(buf) + if err != nil { + logger.Debugf("Disconnected from seccomp socket after write: pid=%v", ucred.pid) + return err + } + + logger.Debugf("Handled seccomp notification from: %v", ucred.pid) return nil } diff --git a/shared/util_linux_cgo.go b/shared/util_linux_cgo.go index faf37d260e..acd8f9b218 100644 --- a/shared/util_linux_cgo.go +++ b/shared/util_linux_cgo.go @@ -191,14 +191,17 @@ int lxc_abstract_unix_recv_fds(int fd, int *recvfds, int num_recvfds, struct iovec iov; struct cmsghdr *cmsg = NULL; char buf[1] = {0}; - size_t cmsgbufsize = CMSG_SPACE(num_recvfds * sizeof(int)); + size_t cmsgbufsize = CMSG_SPACE(sizeof(struct ucred)) + + CMSG_SPACE(num_recvfds * sizeof(int)); memset(&msg, 0, sizeof(msg)); memset(&iov, 0, sizeof(iov)); cmsgbuf = malloc(cmsgbufsize); - if (!cmsgbuf) + if (!cmsgbuf) { + errno = ENOMEM; return -1; + } msg.msg_control = cmsgbuf; msg.msg_controllen = cmsgbufsize; @@ -208,20 +211,31 @@ int lxc_abstract_unix_recv_fds(int fd, int *recvfds, int num_recvfds, msg.msg_iov = &iov; msg.msg_iovlen = 1; +again: ret = recvmsg(fd, &msg, 0); - if (ret <= 0) { - fprintf(stderr, "%s - Failed to receive file descriptor\n", strerror(errno)); - return ret; - } - - cmsg = CMSG_FIRSTHDR(&msg); + if (ret < 0) { + if (errno == EINTR) + goto again; - memset(recvfds, -1, num_recvfds * sizeof(int)); - if (cmsg && cmsg->cmsg_len == CMSG_LEN(num_recvfds * sizeof(int)) && - cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_RIGHTS) { - memcpy(recvfds, CMSG_DATA(cmsg), num_recvfds * sizeof(int)); + goto out; + } + if (ret == 0) + goto out; + + // If SO_PASSCRED is set we will always get a ucred message. + for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { + if (cmsg->cmsg_type != SCM_RIGHTS) + continue; + + memset(recvfds, -1, num_recvfds * sizeof(int)); + if (cmsg && + cmsg->cmsg_len == CMSG_LEN(num_recvfds * sizeof(int)) && + cmsg->cmsg_level == SOL_SOCKET) + memcpy(recvfds, CMSG_DATA(cmsg), num_recvfds * sizeof(int)); + break; } +out: return ret; } */ @@ -273,6 +287,21 @@ func AbstractUnixReceiveFd(sockFD int) (*os.File, error) { return file, nil } +func AbstractUnixReceiveFdData(sockFD int, buf []byte) (int, error) { + fd := C.int(-1) + sk_fd := C.int(sockFD) + ret := C.lxc_abstract_unix_recv_fds(sk_fd, &fd, C.int(1), unsafe.Pointer(&buf[0]), C.size_t(len(buf))) + if ret < 0 { + return int(-C.EBADF), fmt.Errorf("Failed to receive file descriptor via abstract unix socket") + } + + if ret == 0 { + return int(-C.EBADF), io.EOF + } + + return int(fd), nil +} + func OpenPty(uid, gid int64) (master *os.File, slave *os.File, err error) { fd_master := C.int(-1) fd_slave := C.int(-1) From noreply at github.com Thu May 2 20:55:54 2019 From: noreply at github.com (Christian Brauner) Date: Thu, 02 May 2019 13:55:54 -0700 Subject: [lxc-devel] [lxc/lxc] 39e6fd: namespaces: allow a pathname to a nsfd for namespa... Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 39e6fd369d7fbefb7471727e3b58e175e5ea8732 https://github.com/lxc/lxc/commit/39e6fd369d7fbefb7471727e3b58e175e5ea8732 Author: Serge Hallyn Date: 2019-05-02 (Thu, 02 May 2019) Changed paths: M src/lxc/confile_utils.c Log Message: ----------- namespaces: allow a pathname to a nsfd for namespace to share Signed-off-by: Serge Hallyn Commit: 99b68bdb48d2ce95bd481740e01e6a282aa20a3b https://github.com/lxc/lxc/commit/99b68bdb48d2ce95bd481740e01e6a282aa20a3b Author: Christian Brauner Date: 2019-05-02 (Thu, 02 May 2019) Changed paths: M src/lxc/confile_utils.c Log Message: ----------- Merge pull request #2971 from hallyn/2019-05-01/nsshare.2 namespaces: allow a pathname to a nsfd for namespace to share Compare: https://github.com/lxc/lxc/compare/0b5afd323e47...99b68bdb48d2 From noreply at github.com Thu May 2 20:56:11 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Thu, 02 May 2019 13:56:11 -0700 Subject: [lxc-devel] [lxc/lxc] 5ed06d: seccomp: send process memory fd Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 5ed06d3ad6a80a7a8efd10a9c01e90e0e7981306 https://github.com/lxc/lxc/commit/5ed06d3ad6a80a7a8efd10a9c01e90e0e7981306 Author: Christian Brauner Date: 2019-05-02 (Thu, 02 May 2019) Changed paths: M src/lxc/af_unix.c M src/lxc/af_unix.h M src/lxc/seccomp.c Log Message: ----------- seccomp: send process memory fd There's an inherent race when reading a process's memory. The easiest way is to have liblxc get an fd and check that the race was one, send it to the caller (They are free to ignore it if they don't use recvmsg()). Signed-off-by: Christian Brauner Commit: 9e1accb9d28a579d28eac635767382b2f07dfc1b https://github.com/lxc/lxc/commit/9e1accb9d28a579d28eac635767382b2f07dfc1b Author: Stéphane Graber Date: 2019-05-02 (Thu, 02 May 2019) Changed paths: M src/lxc/af_unix.c M src/lxc/af_unix.h M src/lxc/seccomp.c Log Message: ----------- Merge pull request #2972 from brauner/2019-05-02/seccomp_notify_mem_fd seccomp: send process memory fd Compare: https://github.com/lxc/lxc/compare/99b68bdb48d2...9e1accb9d28a From lxc-bot at linuxcontainers.org Thu May 2 23:26:53 2019 From: lxc-bot at linuxcontainers.org (joelhockey on Github) Date: Thu, 02 May 2019 16:26:53 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/storage/quota: Build on older systems Message-ID: <5ccb7cbd.1c69fb81.1613e.bb88SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 366 bytes Desc: not available URL: -------------- next part -------------- From 0e32aa23f9cd7349cb6c6fd04ce87970bff6b553 Mon Sep 17 00:00:00 2001 From: Joel Hockey Date: Thu, 2 May 2019 16:24:18 -0700 Subject: [PATCH] lxd/storage/quota: Build on older systems See #5659 Signed-off-by: Joel Hockey --- lxd/storage/quota/projectquota.go | 1 + 1 file changed, 1 insertion(+) diff --git a/lxd/storage/quota/projectquota.go b/lxd/storage/quota/projectquota.go index 994921f753..37e4480665 100644 --- a/lxd/storage/quota/projectquota.go +++ b/lxd/storage/quota/projectquota.go @@ -14,6 +14,7 @@ import ( /* #include #include +#include #include #include #include From lxc-bot at linuxcontainers.org Fri May 3 02:08:10 2019 From: lxc-bot at linuxcontainers.org (mikemccracken on Github) Date: Thu, 02 May 2019 19:08:10 -0700 (PDT) Subject: [lxc-devel] [crio-lxc/master] handle namespaces Message-ID: <5ccba28a.1c69fb81.6536b.2240SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 530 bytes Desc: not available URL: -------------- next part -------------- From 71f195b3d423e1abf8b82d4357cfa1122511090d Mon Sep 17 00:00:00 2001 From: Michael McCracken Date: Wed, 1 May 2019 18:27:57 -0700 Subject: [PATCH 1/5] create: handle namespaces in spec Signed-off-by: Michael McCracken --- cmd/create.go | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/cmd/create.go b/cmd/create.go index 2d53b6c..c3edad9 100644 --- a/cmd/create.go +++ b/cmd/create.go @@ -9,6 +9,7 @@ import ( "os/exec" "path" "path/filepath" + "regexp" "strings" "time" @@ -42,6 +43,17 @@ var createCmd = cli.Command{ }, } +// maps from CRIO namespace names to LXC names +var NamespaceMap = map[string]string{ + "cgroup": "cgroup", + "ipc": "ipc", + "mount": "mnt", + "network": "net", + "pid": "pid", + "user": "user", + "uts": "uts", +} + func ensureShell(rootfs string) { shPath := filepath.Join(rootfs, "bin/sh") if exists, _ := pathExists(shPath); exists { @@ -202,6 +214,46 @@ func configureContainer(ctx *cli.Context, c *lxc.Container, spec *specs.Spec) er return errors.Wrap(err, "failed to set hook version") } + procPidPathRE := regexp.MustCompile(`/proc/(\d+)/ns`) + + var nsToClone []string + var configVal string + seenNamespaceTypes := map[specs.LinuxNamespaceType]bool{} + for _, ns := range spec.Linux.Namespaces { + if _, ok := seenNamespaceTypes[ns.Type]; ok == true { + return fmt.Errorf("duplicate namespace type %s", ns.Type) + } + seenNamespaceTypes[ns.Type] = true + if ns.Path == "" { + nsToClone = append(nsToClone, NamespaceMap[string(ns.Type)]) + } else { + configKey := fmt.Sprintf("lxc.namespace.share.%s", NamespaceMap[string(ns.Type)]) + + matches := procPidPathRE.FindStringSubmatch(ns.Path) + switch len(matches) { + case 0: + configVal = ns.Path + case 1: + return fmt.Errorf("error parsing namespace path. expected /proc/(\\d+)/ns/*, got '%s'", ns.Path) + case 2: + configVal = matches[1] + default: + return fmt.Errorf("error parsing namespace path. expected /proc/(\\d+)/ns/*, got '%s'", ns.Path) + } + + if err := c.SetConfigItem(configKey, configVal); err != nil { + return errors.Wrapf(err, "failed to set namespace config: '%s'='%s'", configKey, configVal) + } + } + } + + if len(nsToClone) > 0 { + configVal = strings.Join(nsToClone, " ") + if err := c.SetConfigItem("lxc.namespace.clone", configVal); err != nil { + return errors.Wrapf(err, "failed to set lxc.namespace.clone=%s", configVal) + } + } + // capabilities? // if !spec.Process.Terminal { From 8fbba421bedf68439fdc0b72d38d2a6cd4335411 Mon Sep 17 00:00:00 2001 From: Michael McCracken Date: Thu, 2 May 2019 12:10:24 -0700 Subject: [PATCH 2/5] helpers: fix var reference in crictl func want to substitute, not run CRICTLDEBUG Signed-off-by: Michael McCracken --- test/helpers.bash | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/helpers.bash b/test/helpers.bash index 9b906db..1741ea1 100644 --- a/test/helpers.bash +++ b/test/helpers.bash @@ -54,7 +54,7 @@ function crictl { # watch out for: https://github.com/kubernetes-sigs/cri-tools/issues/460 # If you need more debug output, set CRICTLDEBUG to -D CRICTLDEBUG="" - $(which crictl) $(CRICTLDEBUG) --runtime-endpoint "$TEMP_DIR/crio.sock" $@ + $(which crictl) ${CRICTLDEBUG} --runtime-endpoint "$TEMP_DIR/crio.sock" $@ echo "$output" } From d7ed2812ea42801e86d999a6e9e13cfcece4a86c Mon Sep 17 00:00:00 2001 From: Michael McCracken Date: Thu, 2 May 2019 16:38:08 -0700 Subject: [PATCH 3/5] test: clean up created containers Signed-off-by: Michael McCracken --- test/basic.bats | 2 ++ test/manual.bats | 2 ++ 2 files changed, 4 insertions(+) diff --git a/test/basic.bats b/test/basic.bats index b9a8c6d..70e9317 100644 --- a/test/basic.bats +++ b/test/basic.bats @@ -15,4 +15,6 @@ function teardown() { podid=$(crictl pods | grep nginx-sandbox | awk '{ print $1 }') crictl create $podid test/basic-container-config.json test/basic-pod-config.json crictl ps -a | grep busybox + crictl stopp $podid + crictl rmp $podid } diff --git a/test/manual.bats b/test/manual.bats index 7dc4ecf..ec8246c 100644 --- a/test/manual.bats +++ b/test/manual.bats @@ -14,4 +14,6 @@ function teardown() { @test "manual invocation" { crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" create --bundle "$TEMP_DIR/dest" alpine crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" start alpine + crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" kill alpine + crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" delete alpine } From c7d98002c17563dc55f8483a02b633b07f13dcee Mon Sep 17 00:00:00 2001 From: Michael McCracken Date: Thu, 2 May 2019 16:38:27 -0700 Subject: [PATCH 4/5] test: manual: replace shell with sleep so container stays running so we can test killing and deleting Signed-off-by: Michael McCracken --- test/manual.bats | 1 + 1 file changed, 1 insertion(+) diff --git a/test/manual.bats b/test/manual.bats index ec8246c..1ec28ed 100644 --- a/test/manual.bats +++ b/test/manual.bats @@ -5,6 +5,7 @@ function setup() { skopeo --insecure-policy copy docker://alpine:latest oci:$ROOT_DIR/test/oci-cache:alpine umoci unpack --image "$ROOT_DIR/test/oci-cache:alpine" "$TEMP_DIR/dest" sed -i -e "s?rootfs?$TEMP_DIR/dest/rootfs?" "$TEMP_DIR/dest/config.json" + sed -i -e "s?\"/bin/sh\"?\"/bin/sleep\",\n\"60\"?" "$TEMP_DIR/dest/config.json" } function teardown() { From ae8352f59b8980ad4e799c1e1f393db5835e8c23 Mon Sep 17 00:00:00 2001 From: Michael McCracken Date: Thu, 2 May 2019 18:52:03 -0700 Subject: [PATCH 5/5] test: check that container correctly shares a namespace Signed-off-by: Michael McCracken --- test/manual.bats | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/test/manual.bats b/test/manual.bats index 1ec28ed..265e1fb 100644 --- a/test/manual.bats +++ b/test/manual.bats @@ -5,7 +5,9 @@ function setup() { skopeo --insecure-policy copy docker://alpine:latest oci:$ROOT_DIR/test/oci-cache:alpine umoci unpack --image "$ROOT_DIR/test/oci-cache:alpine" "$TEMP_DIR/dest" sed -i -e "s?rootfs?$TEMP_DIR/dest/rootfs?" "$TEMP_DIR/dest/config.json" - sed -i -e "s?\"/bin/sh\"?\"/bin/sleep\",\n\"60\"?" "$TEMP_DIR/dest/config.json" + sed -i -e "s?\"/bin/sh\"?\"/bin/sleep\",\n\"10\"?" "$TEMP_DIR/dest/config.json" + sed -i -e "s?\"type\": \"ipc\"?\"type\": \"ipc\",\n\"path\": \"/proc/1/ns/ipc\"?" "$TEMP_DIR/dest/config.json" + } function teardown() { @@ -13,8 +15,12 @@ function teardown() { } @test "manual invocation" { - crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" create --bundle "$TEMP_DIR/dest" alpine + crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" create --bundle "$TEMP_DIR/dest" --pid-file "$TEMP_DIR/pid" alpine crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" start alpine + pid1ipcnsinode=$(stat -L -c%i /proc/1/ns/ipc) + mypid=$(<"$TEMP_DIR/pid") + mypidipcnsinode=$(stat -L -c%i "/proc/$mypid/ns/ipc") + [ $pid1ipcnsinode = $mypidipcnsinode ] crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" kill alpine crio-lxc --debug --log-level trace --log-file "$TEMP_DIR/log" delete alpine } From lxc-bot at linuxcontainers.org Fri May 3 09:24:16 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Fri, 03 May 2019 02:24:16 -0700 (PDT) Subject: [lxc-devel] [lxc/master] network: Adds gateway device route mode Message-ID: <5ccc08c0.1c69fb81.31d4.82d6SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 474 bytes Desc: not available URL: -------------- next part -------------- From 9bb8418878be31796ee1dca7cd3e7f07de45e05d Mon Sep 17 00:00:00 2001 From: tomponline Date: Fri, 3 May 2019 10:21:45 +0100 Subject: [PATCH] network: Adds gateway device route mode Adds ability to specify "dev" as the gateway value, which will cause a device route to be set as default gateway. Signed-off-by: tomponline --- doc/api-extensions.md | 7 +++ src/lxc/api_extensions.h | 1 + src/lxc/confile.c | 16 +++++- src/lxc/confile_utils.c | 6 +++ src/lxc/network.c | 98 ++++++++++++++++++++++------------- src/lxc/network.h | 4 ++ src/tests/parse_config_file.c | 40 ++++++++++++++ 7 files changed, 133 insertions(+), 39 deletions(-) diff --git a/doc/api-extensions.md b/doc/api-extensions.md index d5a0a3af74..6ec6e5a181 100644 --- a/doc/api-extensions.md +++ b/doc/api-extensions.md @@ -65,3 +65,10 @@ lxc.net[i].ipvlan.isolation=[bridge|private|vepa] (defaults to bridge) lxc.net[i].link=eth0 lxc.net[i].flags=up ``` + +## network\_gateway\_device\_route + +This introduces the ability to specify `lxc.net.[i].veth.ipv4.gateway` and/or +`lxc.net.[i].veth.ipv6.gateway` with a value of `dev` which will cause the default gateway +inside the container to be created as a device route without destination gateway IP needed. +This is primarily intended for use with layer 3 networking devices, such as IPVLAN. diff --git a/src/lxc/api_extensions.h b/src/lxc/api_extensions.h index 55d5e9c96e..086977c987 100644 --- a/src/lxc/api_extensions.h +++ b/src/lxc/api_extensions.h @@ -46,6 +46,7 @@ static char *api_extensions[] = { "seccomp_notify", "network_veth_routes", "network_ipvlan", + "network_gateway_device_route", }; static size_t nr_api_extensions = sizeof(api_extensions) / sizeof(*api_extensions); diff --git a/src/lxc/confile.c b/src/lxc/confile.c index ac7e78eb1b..857fd0c154 100644 --- a/src/lxc/confile.c +++ b/src/lxc/confile.c @@ -655,9 +655,13 @@ static int set_config_net_ipv4_gateway(const char *key, const char *value, free(netdev->ipv4_gateway); - if (!strcmp(value, "auto")) { + if (strcmp(value, "auto") == 0) { netdev->ipv4_gateway = NULL; netdev->ipv4_gateway_auto = true; + } else if (strcmp(value, "dev") == 0) { + netdev->ipv4_gateway = NULL; + netdev->ipv4_gateway_auto = false; + netdev->ipv4_gateway_dev = true; } else { int ret; struct in_addr *gw; @@ -822,9 +826,13 @@ static int set_config_net_ipv6_gateway(const char *key, const char *value, free(netdev->ipv6_gateway); - if (!strcmp(value, "auto")) { + if (strcmp(value, "auto") == 0) { netdev->ipv6_gateway = NULL; netdev->ipv6_gateway_auto = true; + } else if (strcmp(value, "dev") == 0) { + netdev->ipv6_gateway = NULL; + netdev->ipv6_gateway_auto = false; + netdev->ipv6_gateway_dev = true; } else { int ret; struct in6_addr *gw; @@ -5574,6 +5582,8 @@ static int get_config_net_ipv4_gateway(const char *key, char *retv, int inlen, if (netdev->ipv4_gateway_auto) { strprint(retv, inlen, "auto"); + } else if (netdev->ipv4_gateway_dev) { + strprint(retv, inlen, "dev"); } else if (netdev->ipv4_gateway) { inet_ntop(AF_INET, netdev->ipv4_gateway, buf, sizeof(buf)); strprint(retv, inlen, "%s", buf); @@ -5663,6 +5673,8 @@ static int get_config_net_ipv6_gateway(const char *key, char *retv, int inlen, if (netdev->ipv6_gateway_auto) { strprint(retv, inlen, "auto"); + } else if (netdev->ipv6_gateway_dev) { + strprint(retv, inlen, "dev"); } else if (netdev->ipv6_gateway) { inet_ntop(AF_INET6, netdev->ipv6_gateway, buf, sizeof(buf)); strprint(retv, inlen, "%s", buf); diff --git a/src/lxc/confile_utils.c b/src/lxc/confile_utils.c index a43b165ba1..b7d006df3e 100644 --- a/src/lxc/confile_utils.c +++ b/src/lxc/confile_utils.c @@ -357,6 +357,9 @@ void lxc_log_configured_netdevs(const struct lxc_conf *conf) TRACE("ipv4 gateway auto: %s", netdev->ipv4_gateway_auto ? "true" : "false"); + TRACE("ipv4 gateway dev: %s", + netdev->ipv4_gateway_dev ? "true" : "false"); + if (netdev->ipv4_gateway) { inet_ntop(AF_INET, netdev->ipv4_gateway, bufinet4, sizeof(bufinet4)); @@ -373,6 +376,9 @@ void lxc_log_configured_netdevs(const struct lxc_conf *conf) TRACE("ipv6 gateway auto: %s", netdev->ipv6_gateway_auto ? "true" : "false"); + TRACE("ipv6 gateway dev: %s", + netdev->ipv6_gateway_dev ? "true" : "false"); + if (netdev->ipv6_gateway) { inet_ntop(AF_INET6, netdev->ipv6_gateway, bufinet6, sizeof(bufinet6)); diff --git a/src/lxc/network.c b/src/lxc/network.c index def484613d..0ac7695d30 100644 --- a/src/lxc/network.c +++ b/src/lxc/network.c @@ -2027,8 +2027,12 @@ static int ip_gateway_add(int family, int ifindex, void *gw) rt->rtm_dst_len = 0; err = -EINVAL; - if (nla_put_buffer(nlmsg, RTA_GATEWAY, gw, addrlen)) - goto out; + + /* If no gateway address is supplied, setup a device route instead */ + if (gw != NULL) { + if (nla_put_buffer(nlmsg, RTA_GATEWAY, gw, addrlen)) + goto out; + } /* Adding the interface index enables the use of link-local * addresses for the gateway. @@ -3182,12 +3186,12 @@ static int lxc_setup_netdev_in_child_namespaces(struct lxc_netdev *netdev) } /* We can only set up the default routes after bringing - * up the interface, sine bringing up the interface adds + * up the interface, since bringing up the interface adds * the link-local routes and we can't add a default * route if the gateway is not reachable. */ /* setup ipv4 gateway on the interface */ - if (netdev->ipv4_gateway) { + if (netdev->ipv4_gateway || netdev->ipv4_gateway_dev) { if (!(netdev->flags & IFF_UP)) { ERROR("Cannot add ipv4 gateway for network device " "\"%s\" when not bringing up the interface", ifname); @@ -3200,33 +3204,43 @@ static int lxc_setup_netdev_in_child_namespaces(struct lxc_netdev *netdev) return -1; } - err = lxc_ipv4_gateway_add(netdev->ifindex, netdev->ipv4_gateway); - if (err) { - err = lxc_ipv4_dest_add(netdev->ifindex, netdev->ipv4_gateway, 32); - if (err) { - errno = -err; - SYSERROR("Failed to add ipv4 dest for network device \"%s\"", + /* Setup device route if ipv4_gateway_dev is enabled */ + if (netdev->ipv4_gateway_dev) { + err = lxc_ipv4_gateway_add(netdev->ifindex, NULL); + if (err < 0) { + SYSERROR("Failed to setup ipv4 gateway to network device \"%s\"", ifname); + return minus_one_set_errno(-err); } - + } else { err = lxc_ipv4_gateway_add(netdev->ifindex, netdev->ipv4_gateway); if (err) { - errno = -err; - SYSERROR("Failed to setup ipv4 gateway for network device \"%s\"", - ifname); + err = lxc_ipv4_dest_add(netdev->ifindex, netdev->ipv4_gateway, 32); + if (err) { + errno = -err; + SYSERROR("Failed to add ipv4 dest for network device \"%s\"", + ifname); + } + + err = lxc_ipv4_gateway_add(netdev->ifindex, netdev->ipv4_gateway); + if (err) { + errno = -err; + SYSERROR("Failed to setup ipv4 gateway for network device \"%s\"", + ifname); - if (netdev->ipv4_gateway_auto) { - char buf[INET_ADDRSTRLEN]; - inet_ntop(AF_INET, netdev->ipv4_gateway, buf, sizeof(buf)); - ERROR("Tried to set autodetected ipv4 gateway \"%s\"", buf); + if (netdev->ipv4_gateway_auto) { + char buf[INET_ADDRSTRLEN]; + inet_ntop(AF_INET, netdev->ipv4_gateway, buf, sizeof(buf)); + ERROR("Tried to set autodetected ipv4 gateway \"%s\"", buf); + } + return -1; } - return -1; } } } /* setup ipv6 gateway on the interface */ - if (netdev->ipv6_gateway) { + if (netdev->ipv6_gateway || netdev->ipv6_gateway_dev) { if (!(netdev->flags & IFF_UP)) { ERROR("Cannot add ipv6 gateway for network device " "\"%s\" when not bringing up the interface", ifname); @@ -3239,29 +3253,39 @@ static int lxc_setup_netdev_in_child_namespaces(struct lxc_netdev *netdev) return -1; } - err = lxc_ipv6_gateway_add(netdev->ifindex, netdev->ipv6_gateway); - if (err) { - err = lxc_ipv6_dest_add(netdev->ifindex, netdev->ipv6_gateway, 128); - if (err) { - errno = -err; - SYSERROR("Failed to add ipv6 dest for network device \"%s\"", + /* Setup device route if ipv6_gateway_dev is enabled */ + if (netdev->ipv6_gateway_dev) { + err = lxc_ipv6_gateway_add(netdev->ifindex, NULL); + if (err < 0) { + SYSERROR("Failed to setup ipv6 gateway to network device \"%s\"", ifname); + return minus_one_set_errno(-err); } - + } else { err = lxc_ipv6_gateway_add(netdev->ifindex, netdev->ipv6_gateway); if (err) { - errno = -err; - SYSERROR("Failed to setup ipv6 gateway for network device \"%s\"", - ifname); + err = lxc_ipv6_dest_add(netdev->ifindex, netdev->ipv6_gateway, 128); + if (err) { + errno = -err; + SYSERROR("Failed to add ipv6 dest for network device \"%s\"", + ifname); + } - if (netdev->ipv6_gateway_auto) { - char buf[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, netdev->ipv6_gateway, buf, sizeof(buf)); - ERROR("Tried to set autodetected ipv6 " - "gateway for network device " - "\"%s\"", buf); + err = lxc_ipv6_gateway_add(netdev->ifindex, netdev->ipv6_gateway); + if (err) { + errno = -err; + SYSERROR("Failed to setup ipv6 gateway for network device \"%s\"", + ifname); + + if (netdev->ipv6_gateway_auto) { + char buf[INET6_ADDRSTRLEN]; + inet_ntop(AF_INET6, netdev->ipv6_gateway, buf, sizeof(buf)); + ERROR("Tried to set autodetected ipv6 " + "gateway for network device " + "\"%s\"", buf); + } + return -1; } - return -1; } } } diff --git a/src/lxc/network.h b/src/lxc/network.h index fa80404bc2..e60d30d191 100644 --- a/src/lxc/network.h +++ b/src/lxc/network.h @@ -156,9 +156,11 @@ union netdev_p { * @ipv6 : a list of ipv6 addresses to be set on the network device * @ipv4_gateway_auto : whether the ipv4 gateway is to be automatically gathered * from the associated @link + * @ipv4_gateway_dev : whether the ipv4 gateway is to be set as a device route * @ipv4_gateway : ipv4 gateway * @ipv6_gateway_auto : whether the ipv6 gateway is to be automatically gathered * from the associated @link + * @ipv6_gateway_dev : whether the ipv6 gateway is to be set as a device route * @ipv6_gateway : ipv6 gateway * @upscript : a script filename to be executed during interface * configuration @@ -178,8 +180,10 @@ struct lxc_netdev { struct lxc_list ipv4; struct lxc_list ipv6; bool ipv4_gateway_auto; + bool ipv4_gateway_dev; struct in_addr *ipv4_gateway; bool ipv6_gateway_auto; + bool ipv6_gateway_dev; struct in6_addr *ipv6_gateway; char *upscript; char *downscript; diff --git a/src/tests/parse_config_file.c b/src/tests/parse_config_file.c index ad17867b43..bc68ae24cc 100644 --- a/src/tests/parse_config_file.c +++ b/src/tests/parse_config_file.c @@ -108,6 +108,16 @@ static int set_and_clear_complete_netdev(struct lxc_container *c) return -1; } + if (!c->set_config_item(c, "lxc.net.1.ipv4.gateway", "auto")) { + lxc_error("%s\n", "lxc.net.1.ipv4.gateway"); + return -1; + } + + if (!c->set_config_item(c, "lxc.net.1.ipv4.gateway", "dev")) { + lxc_error("%s\n", "lxc.net.1.ipv4.gateway"); + return -1; + } + if (!c->set_config_item(c, "lxc.net.1.ipv6.address", "2003:db8:1:0:214:1234:fe0b:3596/64")) { lxc_error("%s\n", "lxc.net.1.ipv6.address"); @@ -120,6 +130,16 @@ static int set_and_clear_complete_netdev(struct lxc_container *c) return -1; } + if (!c->set_config_item(c, "lxc.net.1.ipv6.gateway", "auto")) { + lxc_error("%s\n", "lxc.net.1.ipv6.gateway"); + return -1; + } + + if (!c->set_config_item(c, "lxc.net.1.ipv6.gateway", "dev")) { + lxc_error("%s\n", "lxc.net.1.ipv6.gateway"); + return -1; + } + if (!c->set_config_item(c, "lxc.net.1.flags", "up")) { lxc_error("%s\n", "lxc.net.1.flags"); return -1; @@ -781,11 +801,31 @@ int main(int argc, char *argv[]) goto non_test_error; } + if (set_get_compare_clear_save_load(c, "lxc.net.0.ipv4.gateway", "auto", tmpf, true)) { + lxc_error("%s\n", "lxc.net.0.ipv4.gateway"); + goto non_test_error; + } + + if (set_get_compare_clear_save_load(c, "lxc.net.0.ipv4.gateway", "dev", tmpf, true)) { + lxc_error("%s\n", "lxc.net.0.ipv4.gateway"); + goto non_test_error; + } + if (set_get_compare_clear_save_load(c, "lxc.net.0.ipv6.gateway", "2003:db8:1::1", tmpf, true)) { lxc_error("%s\n", "lxc.net.0.ipv6.gateway"); goto non_test_error; } + if (set_get_compare_clear_save_load(c, "lxc.net.0.ipv6.gateway", "auto", tmpf, true)) { + lxc_error("%s\n", "lxc.net.0.ipv6.gateway"); + goto non_test_error; + } + + if (set_get_compare_clear_save_load(c, "lxc.net.0.ipv6.gateway", "dev", tmpf, true)) { + lxc_error("%s\n", "lxc.net.0.ipv6.gateway"); + goto non_test_error; + } + if (set_get_compare_clear_save_load(c, "lxc.net.0.ipv4.address", "10.0.2.3/24", tmpf, true)) { lxc_error("%s\n", "lxc.net.0.ipv4.address"); goto non_test_error; From noreply at github.com Fri May 3 10:36:02 2019 From: noreply at github.com (Christian Brauner) Date: Fri, 03 May 2019 03:36:02 -0700 Subject: [lxc-devel] [lxc/lxc] 650915: network: Adds layer 2 (ARP/NDP) proxy mode Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 6509154de18b96f0e6f1eb5fb066a9d2bfff7b91 https://github.com/lxc/lxc/commit/6509154de18b96f0e6f1eb5fb066a9d2bfff7b91 Author: tomponline Date: 2019-05-02 (Thu, 02 May 2019) Changed paths: M doc/api-extensions.md M doc/lxc.container.conf.sgml.in M src/lxc/api_extensions.h M src/lxc/confile.c M src/lxc/confile_utils.c M src/lxc/file_utils.c M src/lxc/file_utils.h M src/lxc/network.c M src/lxc/network.h Log Message: ----------- network: Adds layer 2 (ARP/NDP) proxy mode Adds the lxc.net.[i].l2proxy flag that can be either 0 or 1. Defaults to 0. This, when used with lxc.net.[i].link, will add IP neighbour proxy entries on the linked device for any IPv4 and IPv6 addresses on the container's network device. Additionally, for IPv6 addresses it will check the following sysctl values and fail with an error if not set: net.ipv6.conf.[link].proxy_ndp=1 net.ipv6.conf.[link].forwarding=1 Signed-off-by: tomponline Commit: 5b94d538dd1e69fd690f7994036a070c2857ffc2 https://github.com/lxc/lxc/commit/5b94d538dd1e69fd690f7994036a070c2857ffc2 Author: Christian Brauner Date: 2019-05-03 (Fri, 03 May 2019) Changed paths: M doc/api-extensions.md M doc/lxc.container.conf.sgml.in M src/lxc/api_extensions.h M src/lxc/confile.c M src/lxc/confile_utils.c M src/lxc/file_utils.c M src/lxc/file_utils.h M src/lxc/network.c M src/lxc/network.h Log Message: ----------- Merge pull request #2964 from tomponline/tp-l2proxy network: Adds layer 2 (ARP/NDP) proxy mode Compare: https://github.com/lxc/lxc/compare/9e1accb9d28a...5b94d538dd1e From lxc-bot at linuxcontainers.org Fri May 3 14:12:43 2019 From: lxc-bot at linuxcontainers.org (ajkavanagh on Github) Date: Fri, 03 May 2019 07:12:43 -0700 (PDT) Subject: [lxc-devel] [pylxd/master] Add PYLXD_WARNINGS env variable to be able to supress warnings Message-ID: <5ccc4c5b.1c69fb81.470a6.0304SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 858 bytes Desc: not available URL: -------------- next part -------------- From 6ba85dc5ca1db19cef7daaab08938888c008c77b Mon Sep 17 00:00:00 2001 From: Alex Kavanagh Date: Fri, 3 May 2019 15:10:16 +0100 Subject: [PATCH] Add PYLXD_WARNINGS env variable to be able to supress warnings If the LXD server that pylxd is connected to supports attributes on objects that pylxd doesn't yet support, then a warning is issued using the `warnings` module. This can fill logs with annoying warnings. So this patch adds the ability to set an env variable PYLXD_WARNINGS to 'none' to suppress all the warnings, or to 'always' to get the existing behaviour. The new behavior is to issue a warning once for each instance of an attribute that isn't known for each object. Closes: #301 Signed-off-by: Alex Kavanagh --- doc/source/usage.rst | 16 ++++++++++++++ pylxd/models/_model.py | 21 ++++++++++++++++++ pylxd/models/operation.py | 22 ++++++++++++++++++- pylxd/tests/models/test_model.py | 33 ++++++++++++++++++++++++++++ pylxd/tests/models/test_operation.py | 31 ++++++++++++++++++++++++++ tox.ini | 1 + 6 files changed, 123 insertions(+), 1 deletion(-) diff --git a/doc/source/usage.rst b/doc/source/usage.rst index 109071d4..1d2b938b 100644 --- a/doc/source/usage.rst +++ b/doc/source/usage.rst @@ -101,3 +101,19 @@ Some changes to LXD will return immediately, but actually occur in the background after the http response returns. All operations that happen this way will also take an optional `wait` parameter that, when `True`, will not return until the operation is completed. + +UserWarning: Attempted to set unknown attribute "x" on instance of "y" +---------------------------------------------------------------------- + +The LXD server changes frequently, particularly if it is snap installed. In +this case it is possible that the LXD server may send back objects with +attributes that this version of pylxd is not aware of, and in that situation, +the pylxd library issues the warning above. + +The default behaviour is that *one* warning is issued for each unknown +attribute on *each* object class that it unknown. Further warnings are then +surpressed. The environment variable ``PYLXD_WARNINGS`` can be set to control +the warnings further: + + - if set to ``none`` then *all* warnings are surpressed all the time. + - if set to ``always`` then warnings are always issued for each instance returned from the server. diff --git a/pylxd/models/_model.py b/pylxd/models/_model.py index 83fd0673..62887446 100644 --- a/pylxd/models/_model.py +++ b/pylxd/models/_model.py @@ -11,6 +11,7 @@ # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. +import os import warnings import six @@ -83,6 +84,11 @@ def __new__(cls, name, bases, attrs): return super(ModelType, cls).__new__(cls, name, bases, attrs) +# Global used to record which warnings have been issues already for unknown +# attributes. +_seen_attribute_warnings = set() + + @six.add_metaclass(ModelType) class Model(object): """A Base LXD object model. @@ -98,6 +104,13 @@ class Model(object): un-initialized attributes are read. When attributes are modified, the instance is marked as dirty. `save` will save the changes to the server. + + If the LXD server sends attributes that this version of pylxd is unaware of + then a warning is printed. By default the warning is issued ONCE and then + supressed for every subsequent attempted setting. The warnings can be + completely suppressed by setting the environment variable PYLXD_WARNINGS to + 'none', or always displayed by setting the PYLXD_WARNINGS variable to + 'always'. """ NotFound = exceptions.NotFound __slots__ = ['client', '__dirty__'] @@ -110,6 +123,14 @@ def __init__(self, client, **kwargs): try: setattr(self, key, val) except AttributeError: + global _seen_attribute_warnings + env = os.environ.get('PYLXD_WARNINGS', '').lower() + item = "{}.{}".format(self.__class__.__name__, key) + if env != 'always' and item in _seen_attribute_warnings: + continue + _seen_attribute_warnings.add(item) + if env == 'none': + continue warnings.warn( 'Attempted to set unknown attribute "{}" ' 'on instance of "{}"'.format( diff --git a/pylxd/models/operation.py b/pylxd/models/operation.py index a36df32c..ac094f50 100644 --- a/pylxd/models/operation.py +++ b/pylxd/models/operation.py @@ -19,8 +19,21 @@ from six.moves.urllib import parse +# Global used to record which warnings have been issues already for unknown +# attributes. +_seen_attribute_warnings = set() + + class Operation(object): - """A LXD operation.""" + """An LXD operation. + + If the LXD server sends attributes that this version of pylxd is unaware of + then a warning is printed. By default the warning is issued ONCE and then + supressed for every subsequent attempted setting. The warnings can be + completely suppressed by setting the environment variable PYLXD_WARNINGS to + 'none', or always displayed by setting the PYLXD_WARNINGS variable to + 'always'. + """ __slots__ = [ '_client', @@ -53,6 +66,13 @@ def __init__(self, **kwargs): except AttributeError: # ignore attributes we don't know about -- prevent breakage # in the future if new attributes are added. + global _seen_attribute_warnings + env = os.environ.get('PYLXD_WARNINGS', '').lower() + if env != 'always' and key in _seen_attribute_warnings: + continue + _seen_attribute_warnings.add(key) + if env == 'none': + continue warnings.warn( 'Attempted to set unknown attribute "{}" ' 'on instance of "{}"' diff --git a/pylxd/tests/models/test_model.py b/pylxd/tests/models/test_model.py index 9aa1a4f3..244df0d3 100644 --- a/pylxd/tests/models/test_model.py +++ b/pylxd/tests/models/test_model.py @@ -11,6 +11,8 @@ # WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the # License for the specific language governing permissions and limitations # under the License. +import mock + from pylxd.models import _model as model from pylxd.tests import testing @@ -76,6 +78,37 @@ def test_init(self): self.assertEqual(self.client, item.client) self.assertEqual('an-item', item.name) + @mock.patch.dict('os.environ', {'PYLXD_WARNINGS': ''}) + @mock.patch('warnings.warn') + def test_init_warnings_once(self, mock_warn): + with mock.patch('pylxd.models._model._seen_attribute_warnings', + new=set()): + Item(self.client, unknown='some_value') + mock_warn.assert_called_once_with(mock.ANY) + Item(self.client, unknown='some_value_as_well') + mock_warn.assert_called_once_with(mock.ANY) + Item(self.client, unknown2="some_2nd_value") + self.assertEqual(len(mock_warn.call_args_list), 2) + + @mock.patch.dict('os.environ', {'PYLXD_WARNINGS': 'none'}) + @mock.patch('warnings.warn') + def test_init_warnings_none(self, mock_warn): + with mock.patch('pylxd.models._model._seen_attribute_warnings', + new=set()): + Item(self.client, unknown='some_value') + mock_warn.assert_not_called() + + @mock.patch.dict('os.environ', {'PYLXD_WARNINGS': 'always'}) + @mock.patch('warnings.warn') + def test_init_warnings_always(self, mock_warn): + with mock.patch('pylxd.models._model._seen_attribute_warnings', + new=set()): + Item(self.client, unknown='some_value') + mock_warn.assert_called_once_with(mock.ANY) + Item(self.client, unknown='some_value_as_well') + self.assertEqual(len(mock_warn.call_args_list), 2) + + @mock.patch.dict('os.environ', {'PYLXD_WARNINGS': 'none'}) def test_init_unknown_attribute(self): """Unknown attributes aren't set.""" item = Item(self.client, name='an-item', nonexistent='SRSLY') diff --git a/pylxd/tests/models/test_operation.py b/pylxd/tests/models/test_operation.py index 1e70e869..83790b06 100644 --- a/pylxd/tests/models/test_operation.py +++ b/pylxd/tests/models/test_operation.py @@ -13,6 +13,7 @@ # under the License. import json +import mock from pylxd import exceptions, models from pylxd.tests import testing @@ -21,6 +22,36 @@ class TestOperation(testing.PyLXDTestCase): """Tests for pylxd.models.Operation.""" + @mock.patch.dict('os.environ', {'PYLXD_WARNINGS': ''}) + @mock.patch('warnings.warn') + def test_init_warnings_once(self, mock_warn): + with mock.patch('pylxd.models.operation._seen_attribute_warnings', + new=set()): + models.Operation(unknown='some_value') + mock_warn.assert_called_once_with(mock.ANY) + models.Operation(unknown='some_value_as_well') + mock_warn.assert_called_once_with(mock.ANY) + models.Operation(unknown2="some_2nd_value") + self.assertEqual(len(mock_warn.call_args_list), 2) + + @mock.patch.dict('os.environ', {'PYLXD_WARNINGS': 'none'}) + @mock.patch('warnings.warn') + def test_init_warnings_none(self, mock_warn): + with mock.patch('pylxd.models.operation._seen_attribute_warnings', + new=set()): + models.Operation(unknown='some_value') + mock_warn.assert_not_called() + + @mock.patch.dict('os.environ', {'PYLXD_WARNINGS': 'always'}) + @mock.patch('warnings.warn') + def test_init_warnings_always(self, mock_warn): + with mock.patch('pylxd.models.operation._seen_attribute_warnings', + new=set()): + models.Operation(unknown='some_value') + mock_warn.assert_called_once_with(mock.ANY) + models.Operation(unknown='some_value_as_well') + self.assertEqual(len(mock_warn.call_args_list), 2) + def test_get(self): """Return an operation.""" name = 'operation-abc' diff --git a/tox.ini b/tox.ini index 7f460636..3dbff000 100644 --- a/tox.ini +++ b/tox.ini @@ -8,6 +8,7 @@ usedevelop = True install_command = pip install -U {opts} {packages} setenv = VIRTUAL_ENV={envdir} + PYLXD_WARNINGS=none deps = -r{toxinidir}/requirements.txt -r{toxinidir}/test-requirements.txt commands = nosetests --with-coverage --cover-package=pylxd pylxd From lxc-bot at linuxcontainers.org Fri May 3 18:35:45 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Fri, 03 May 2019 11:35:45 -0700 (PDT) Subject: [lxc-devel] [lxc/master] tree-wide: make socket SOCK_CLOEXEC Message-ID: <5ccc8a01.1c69fb81.c1e00.ac5bSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 364 bytes Desc: not available URL: -------------- next part -------------- From ad9429e52927b22ae74a3d8bd25943a9a833b71e Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 3 May 2019 20:35:02 +0200 Subject: [PATCH] tree-wide: make socket SOCK_CLOEXEC Signed-off-by: Christian Brauner --- src/lxc/af_unix.c | 6 +++--- src/lxc/network.c | 6 +++--- src/lxc/nl.c | 2 +- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/src/lxc/af_unix.c b/src/lxc/af_unix.c index 9e2f8587c8..c688a8746f 100644 --- a/src/lxc/af_unix.c +++ b/src/lxc/af_unix.c @@ -81,7 +81,7 @@ int lxc_abstract_unix_open(const char *path, int type, int flags) ssize_t len; struct sockaddr_un addr; - fd = socket(PF_UNIX, type, 0); + fd = socket(PF_UNIX, type | SOCK_CLOEXEC, 0); if (fd < 0) return -1; @@ -129,7 +129,7 @@ int lxc_abstract_unix_connect(const char *path) ssize_t len; struct sockaddr_un addr; - fd = socket(PF_UNIX, SOCK_STREAM, 0); + fd = socket(PF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0); if (fd < 0) return -1; @@ -371,7 +371,7 @@ int lxc_unix_connect(struct sockaddr_un *addr) int ret; ssize_t len; - fd = socket(AF_UNIX, SOCK_STREAM, 0); + fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0); if (fd < 0) { SYSERROR("Failed to open new AF_UNIX socket"); return -1; diff --git a/src/lxc/network.c b/src/lxc/network.c index a71eb5ddff..12666e4873 100644 --- a/src/lxc/network.c +++ b/src/lxc/network.c @@ -2187,7 +2187,7 @@ int lxc_bridge_attach(const char *bridge, const char *ifname) if (is_ovs_bridge(bridge)) return lxc_ovs_attach_bridge(bridge, ifname); - fd = socket(AF_INET, SOCK_STREAM, 0); + fd = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, 0); if (fd < 0) return -errno; @@ -2292,7 +2292,7 @@ int setup_private_host_hw_addr(char *veth1) int err, sockfd; struct ifreq ifr; - sockfd = socket(AF_INET, SOCK_DGRAM, 0); + sockfd = socket(AF_INET, SOCK_DGRAM | SOCK_CLOEXEC, 0); if (sockfd < 0) return -errno; @@ -3191,7 +3191,7 @@ static int setup_hw_addr(char *hwaddr, const char *ifname) ifr.ifr_name[IFNAMSIZ-1] = '\0'; memcpy((char *) &ifr.ifr_hwaddr, (char *) &sockaddr, sizeof(sockaddr)); - fd = socket(AF_INET, SOCK_DGRAM, 0); + fd = socket(AF_INET, SOCK_DGRAM | SOCK_CLOEXEC, 0); if (fd < 0) return -1; diff --git a/src/lxc/nl.c b/src/lxc/nl.c index eb4535a731..15beec2a0e 100644 --- a/src/lxc/nl.c +++ b/src/lxc/nl.c @@ -295,7 +295,7 @@ extern int netlink_open(struct nl_handler *handler, int protocol) memset(handler, 0, sizeof(*handler)); - handler->fd = socket(AF_NETLINK, SOCK_RAW, protocol); + handler->fd = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, protocol); if (handler->fd < 0) return -errno; From noreply at github.com Fri May 3 19:09:39 2019 From: noreply at github.com (Christian Brauner) Date: Fri, 03 May 2019 19:09:39 +0000 (UTC) Subject: [lxc-devel] [lxc/lxc] b67001: network: Adds ipvlan static routes for l2proxy mode Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: b670016ac9e5f26cd7984a9cbfc21b8b9878feee https://github.com/lxc/lxc/commit/b670016ac9e5f26cd7984a9cbfc21b8b9878feee Author: tomponline Date: 2019-05-03 (Fri, 03 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Adds ipvlan static routes for l2proxy mode Signed-off-by: tomponline Commit: 9e8c3ebeb50c338b6e754abc47ee114e52a3b2d8 https://github.com/lxc/lxc/commit/9e8c3ebeb50c338b6e754abc47ee114e52a3b2d8 Author: Christian Brauner Date: 2019-05-03 (Fri, 03 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- Merge pull request #2968 from tomponline/tp-ipvlan-l2proxy network: Static routes for IPVLAN with L2PROXY Compare: https://github.com/lxc/lxc/compare/5b94d538dd1e...9e8c3ebeb50c From noreply at github.com Sat May 4 10:56:47 2019 From: noreply at github.com (Christian Brauner) Date: Sat, 04 May 2019 03:56:47 -0700 Subject: [lxc-devel] [lxc/lxc] a2f9a6: network: Adds gateway device route mode Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: a2f9a6706d6363faa1fb6a091679799a1d41f374 https://github.com/lxc/lxc/commit/a2f9a6706d6363faa1fb6a091679799a1d41f374 Author: tomponline Date: 2019-05-03 (Fri, 03 May 2019) Changed paths: M doc/api-extensions.md M doc/lxc.container.conf.sgml.in M src/lxc/api_extensions.h M src/lxc/confile.c M src/lxc/confile_utils.c M src/lxc/network.c M src/lxc/network.h M src/tests/parse_config_file.c Log Message: ----------- network: Adds gateway device route mode Adds ability to specify "dev" as the gateway value, which will cause a device route to be set as default gateway. Signed-off-by: tomponline Commit: 0854538f134b6ff59a12370bdd76b59228168a25 https://github.com/lxc/lxc/commit/0854538f134b6ff59a12370bdd76b59228168a25 Author: Christian Brauner Date: 2019-05-04 (Sat, 04 May 2019) Changed paths: M doc/api-extensions.md M doc/lxc.container.conf.sgml.in M src/lxc/api_extensions.h M src/lxc/confile.c M src/lxc/confile_utils.c M src/lxc/network.c M src/lxc/network.h M src/tests/parse_config_file.c Log Message: ----------- Merge pull request #2973 from tomponline/tp-gw-dev network: Adds gateway device route mode Compare: https://github.com/lxc/lxc/compare/9e8c3ebeb50c...0854538f134b From lxc-bot at linuxcontainers.org Sat May 4 11:39:32 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Sat, 04 May 2019 04:39:32 -0700 (PDT) Subject: [lxc-devel] [lxc/master] compiler: add __returns_twice attribute Message-ID: <5ccd79f4.1c69fb81.8e1fe.a61fSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 816 bytes Desc: not available URL: -------------- next part -------------- From 633cb8bee31b0ce075adbe8a143f88f533605552 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Sat, 4 May 2019 13:35:51 +0200 Subject: [PATCH] compiler: add __returns_twice attribute The returns_twice attribute tells the compiler that a function may return more than one time. The compiler will ensure that all registers are dead before calling such a function and will emit a warning about the variables that may be clobbered after the second return from the function. Examples of such functions are setjmp and vfork. The longjmp-like counterpart of such function, if any, might need to be marked with the noreturn attribute. Signed-off-by: Christian Brauner --- src/lxc/compiler.h | 4 ++++ src/lxc/raw_syscalls.c | 3 ++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/src/lxc/compiler.h b/src/lxc/compiler.h index 65457cb31b..9b0de394a5 100644 --- a/src/lxc/compiler.h +++ b/src/lxc/compiler.h @@ -59,6 +59,10 @@ # define __hot __attribute__((hot)) #endif +#ifndef __returns_twice +#define __returns_twice __attribute__((returns_twice)) +#endif + #define __cgfsng_ops #endif /* __LXC_COMPILER_H */ diff --git a/src/lxc/raw_syscalls.c b/src/lxc/raw_syscalls.c index a4db306919..2e15575870 100644 --- a/src/lxc/raw_syscalls.c +++ b/src/lxc/raw_syscalls.c @@ -9,6 +9,7 @@ #include #include +#include "compiler.h" #include "config.h" #include "macro.h" #include "raw_syscalls.h" @@ -32,7 +33,7 @@ int lxc_raw_execveat(int dirfd, const char *pathname, char *const argv[], * The nice thing about this is that we get fork() behavior. That is * lxc_raw_clone() returns 0 in the child and the child pid in the parent. */ -pid_t lxc_raw_clone(unsigned long flags) +__returns_twice pid_t lxc_raw_clone(unsigned long flags) { /* * These flags don't interest at all so we don't jump through any hoops From lxc-bot at linuxcontainers.org Sun May 5 04:17:02 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Sat, 04 May 2019 21:17:02 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/storage/btrfs: Don't make ro snapshots when unpriv Message-ID: <5cce63be.1c69fb81.2d19f.e816SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 514 bytes Desc: not available URL: -------------- next part -------------- From 0e246fc917100a0b71bf4fe0f82928a168f9c2b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Sun, 5 May 2019 00:16:18 -0400 Subject: [PATCH] lxd/storage/btrfs: Don't make ro snapshots when unpriv MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/patches.go | 2 +- lxd/storage_btrfs.go | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/lxd/patches.go b/lxd/patches.go index d6cf113466..084bd11133 100644 --- a/lxd/patches.go +++ b/lxd/patches.go @@ -684,7 +684,7 @@ func upgradeFromStorageTypeBtrfs(name string, d *Daemon, defaultPoolName string, oldSnapshotMntPoint := shared.VarPath("snapshots", cs) newSnapshotMntPoint := getSnapshotMountPoint("default", defaultPoolName, cs) if shared.PathExists(oldSnapshotMntPoint) && !shared.PathExists(newSnapshotMntPoint) { - err = btrfsSnapshot(oldSnapshotMntPoint, newSnapshotMntPoint, true) + err = btrfsSnapshot(d.State(), oldSnapshotMntPoint, newSnapshotMntPoint, true) if err != nil { err := btrfsSubVolumeCreate(newSnapshotMntPoint) if err != nil { diff --git a/lxd/storage_btrfs.go b/lxd/storage_btrfs.go index 3d44e04fff..fc1bc2347f 100644 --- a/lxd/storage_btrfs.go +++ b/lxd/storage_btrfs.go @@ -2267,10 +2267,10 @@ func btrfsSubVolumesDelete(subvol string) error { * btrfsSnapshot creates a snapshot of "source" to "dest" * the result will be readonly if "readonly" is True. */ -func btrfsSnapshot(source string, dest string, readonly bool) error { +func btrfsSnapshot(s *state.State, source string, dest string, readonly bool) error { var output string var err error - if readonly { + if readonly && !s.OS.RunningInUserNS { output, err = shared.RunCommand( "btrfs", "subvolume", @@ -2299,7 +2299,7 @@ func btrfsSnapshot(source string, dest string, readonly bool) error { } func (s *storageBtrfs) btrfsPoolVolumeSnapshot(source string, dest string, readonly bool) error { - return btrfsSnapshot(source, dest, readonly) + return btrfsSnapshot(s.s, source, dest, readonly) } func (s *storageBtrfs) btrfsPoolVolumesSnapshot(source string, dest string, readonly bool, recursive bool) error { From noreply at github.com Sun May 5 04:19:53 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Sat, 04 May 2019 21:19:53 -0700 Subject: [lxc-devel] [lxc/lxc] 633cb8: compiler: add __returns_twice attribute Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 633cb8bee31b0ce075adbe8a143f88f533605552 https://github.com/lxc/lxc/commit/633cb8bee31b0ce075adbe8a143f88f533605552 Author: Christian Brauner Date: 2019-05-04 (Sat, 04 May 2019) Changed paths: M src/lxc/compiler.h M src/lxc/raw_syscalls.c Log Message: ----------- compiler: add __returns_twice attribute The returns_twice attribute tells the compiler that a function may return more than one time. The compiler will ensure that all registers are dead before calling such a function and will emit a warning about the variables that may be clobbered after the second return from the function. Examples of such functions are setjmp and vfork. The longjmp-like counterpart of such function, if any, might need to be marked with the noreturn attribute. Signed-off-by: Christian Brauner Commit: 3ade816713022598f916acc8089cf567b4fa1f16 https://github.com/lxc/lxc/commit/3ade816713022598f916acc8089cf567b4fa1f16 Author: Stéphane Graber Date: 2019-05-05 (Sun, 05 May 2019) Changed paths: M src/lxc/compiler.h M src/lxc/raw_syscalls.c Log Message: ----------- Merge pull request #2975 from brauner/2019-05-04/returns_twice compiler: add __returns_twice attribute Compare: https://github.com/lxc/lxc/compare/0854538f134b...3ade81671302 From noreply at github.com Sun May 5 04:20:07 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Sun, 05 May 2019 04:20:07 +0000 (UTC) Subject: [lxc-devel] [lxc/lxc] ad9429: tree-wide: make socket SOCK_CLOEXEC Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: ad9429e52927b22ae74a3d8bd25943a9a833b71e https://github.com/lxc/lxc/commit/ad9429e52927b22ae74a3d8bd25943a9a833b71e Author: Christian Brauner Date: 2019-05-03 (Fri, 03 May 2019) Changed paths: M src/lxc/af_unix.c M src/lxc/network.c M src/lxc/nl.c Log Message: ----------- tree-wide: make socket SOCK_CLOEXEC Signed-off-by: Christian Brauner Commit: 192023dd5ac8ac914a50e65133254a7067c0bfbc https://github.com/lxc/lxc/commit/192023dd5ac8ac914a50e65133254a7067c0bfbc Author: Stéphane Graber Date: 2019-05-05 (Sun, 05 May 2019) Changed paths: M src/lxc/af_unix.c M src/lxc/network.c M src/lxc/nl.c Log Message: ----------- Merge pull request #2974 from brauner/master tree-wide: make socket SOCK_CLOEXEC Compare: https://github.com/lxc/lxc/compare/3ade81671302...192023dd5ac8 From lxc-bot at linuxcontainers.org Mon May 6 00:51:16 2019 From: lxc-bot at linuxcontainers.org (joelhockey on Github) Date: Sun, 05 May 2019 17:51:16 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/storage/quota: guard quota defs for compiling Message-ID: <5ccf8504.1c69fb81.f6cd5.f705SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 353 bytes Desc: not available URL: -------------- next part -------------- From 4f9c0c8334b88e8051caf506600f884691f8a1a0 Mon Sep 17 00:00:00 2001 From: Joel Hockey Date: Sun, 5 May 2019 16:53:20 -0700 Subject: [PATCH] lxd/storage/quota: guard quota defs for compiling Signed-off-by: Joel Hockey --- lxd/storage/quota/projectquota.go | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/lxd/storage/quota/projectquota.go b/lxd/storage/quota/projectquota.go index 37e4480665..e7e44946ae 100644 --- a/lxd/storage/quota/projectquota.go +++ b/lxd/storage/quota/projectquota.go @@ -30,7 +30,10 @@ struct fsxattr { __u32 fsx_projid; unsigned char fsx_pad[12]; }; +#define FS_XFLAG_PROJINHERIT 0x00000200 +#endif +#ifndef QIF_DQBLKSIZE_BITS struct if_dqinfo { __u64 dqi_bgrace; __u64 dqi_igrace; @@ -49,7 +52,7 @@ struct if_dqblk { __u64 dqb_itime; __u32 dqb_valid; }; -#define FS_XFLAG_PROJINHERIT 0x00000200 +#define QIF_DQBLKSIZE_BITS 10 #endif #ifndef FS_IOC_FSGETXATTR From lxc-bot at linuxcontainers.org Mon May 6 07:40:25 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Mon, 06 May 2019 00:40:25 -0700 (PDT) Subject: [lxc-devel] [lxc/master] seccomp: document path calculation Message-ID: <5ccfe4e9.1c69fb81.9205a.aa0cSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 364 bytes Desc: not available URL: -------------- next part -------------- From 18847d37dda145539a28c1dea291af03ec810163 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Mon, 6 May 2019 09:39:40 +0200 Subject: [PATCH] seccomp: document path calculation Signed-off-by: Christian Brauner --- src/lxc/seccomp.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/lxc/seccomp.c b/src/lxc/seccomp.c index bfbc19ac53..96ad03ff0b 100644 --- a/src/lxc/seccomp.c +++ b/src/lxc/seccomp.c @@ -1338,7 +1338,10 @@ int seccomp_notify_handler(int fd, uint32_t events, void *data, __do_close_prot_errno int fd_mem = -EBADF; int reconnect_count, ret; ssize_t bytes; - char mem_path[6 + 21 + 5]; + char mem_path[6 /* /proc/ */ + + INTTYPE_TO_STRLEN(int64_t) + + 3 /* mem */ + + 1 /* \0 */]; struct lxc_handler *hdlr = data; struct lxc_conf *conf = hdlr->conf; struct seccomp_notif *req = conf->seccomp.notifier.req_buf; From lxc-bot at linuxcontainers.org Mon May 6 08:51:16 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Mon, 06 May 2019 01:51:16 -0700 (PDT) Subject: [lxc-devel] [lxc/master] raw_syscalls: add initial support for pidfd_send_signal() Message-ID: <5ccff584.1c69fb81.7fcaa.fb3dSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 418 bytes Desc: not available URL: -------------- next part -------------- From d9bb2fbab6c0f9a502408a4aa9bfbd9951c1b568 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Mon, 6 May 2019 10:49:31 +0200 Subject: [PATCH] raw_syscalls: add initial support for pidfd_send_signal() Well, I added this syscall so we better use it. :) Signed-off-by: Christian Brauner --- src/lxc/raw_syscalls.c | 11 +++++++++++ src/lxc/raw_syscalls.h | 4 ++++ src/lxc/start.c | 44 +++++++++++++++++++++++++++++++++++++++--- src/lxc/start.h | 6 ++++++ 4 files changed, 62 insertions(+), 3 deletions(-) diff --git a/src/lxc/raw_syscalls.c b/src/lxc/raw_syscalls.c index 2e15575870..a16f6edf76 100644 --- a/src/lxc/raw_syscalls.c +++ b/src/lxc/raw_syscalls.c @@ -108,3 +108,14 @@ pid_t lxc_raw_clone_cb(int (*fn)(void *), void *args, unsigned long flags) return pid; } + +int lxc_raw_pidfd_send_signal(int pidfd, int sig, siginfo_t *info, + unsigned int flags) +{ +#ifdef __NR_pidfd_send_signal + syscall(__NR_pidfd_send_signal, pidfd, sig, info, flags); +#else + errno = ENOSYS; +#endif + return -1; +} diff --git a/src/lxc/raw_syscalls.h b/src/lxc/raw_syscalls.h index 224cf92fca..6c27f26a0b 100644 --- a/src/lxc/raw_syscalls.h +++ b/src/lxc/raw_syscalls.h @@ -26,6 +26,7 @@ #include #include #include +#include #include #include @@ -92,4 +93,7 @@ static inline pid_t lxc_raw_gettid(void) #endif } +extern int lxc_raw_pidfd_send_signal(int pidfd, int sig, siginfo_t *info, + unsigned int flags); + #endif /* __LXC_RAW_SYSCALL_H */ diff --git a/src/lxc/start.c b/src/lxc/start.c index 5209af3586..651511dbe3 100644 --- a/src/lxc/start.c +++ b/src/lxc/start.c @@ -406,14 +406,21 @@ static int signal_handler(int fd, uint32_t events, void *data, } if (siginfo.ssi_signo == SIGHUP) { - kill(hdlr->pid, SIGTERM); + if (hdlr->proc_pidfd >= 0) + lxc_raw_pidfd_send_signal(hdlr->proc_pidfd, SIGTERM, NULL, 0); + else + kill(hdlr->pid, SIGTERM); INFO("Killing %d since terminal hung up", hdlr->pid); return hdlr->init_died ? LXC_MAINLOOP_CLOSE : LXC_MAINLOOP_CONTINUE; } if (siginfo.ssi_signo != SIGCHLD) { - kill(hdlr->pid, siginfo.ssi_signo); + if (hdlr->proc_pidfd >= 0) + lxc_raw_pidfd_send_signal(hdlr->proc_pidfd, + siginfo.ssi_signo, NULL, 0); + else + kill(hdlr->pid, siginfo.ssi_signo); INFO("Forwarded signal %d to pid %d", siginfo.ssi_signo, hdlr->pid); return hdlr->init_died ? LXC_MAINLOOP_CLOSE : LXC_MAINLOOP_CONTINUE; @@ -658,6 +665,8 @@ void lxc_zero_handler(struct lxc_handler *handler) handler->pinfd = -1; + handler->proc_pidfd = -EBADF; + handler->sigfd = -1; for (i = 0; i < LXC_NS_MAX; i++) @@ -678,6 +687,9 @@ void lxc_free_handler(struct lxc_handler *handler) if (handler->pinfd >= 0) close(handler->pinfd); + if (handler->proc_pidfd >= 0) + close(handler->proc_pidfd); + if (handler->sigfd >= 0) close(handler->sigfd); @@ -722,6 +734,7 @@ struct lxc_handler *lxc_init_handler(const char *name, struct lxc_conf *conf, handler->conf = conf; handler->lxcpath = lxcpath; handler->pinfd = -1; + handler->proc_pidfd = -EBADF; handler->sigfd = -EBADF; handler->init_died = false; handler->state_socket_pair[0] = handler->state_socket_pair[1] = -1; @@ -1088,7 +1101,7 @@ void lxc_abort(const char *name, struct lxc_handler *handler) lxc_set_state(name, handler, ABORTING); if (handler->pid > 0) { - ret = kill(handler->pid, SIGKILL); + ret = lxc_raw_pidfd_send_signal(handler->proc_pidfd, SIGKILL, NULL, 0); if (ret < 0) SYSERROR("Failed to send SIGKILL to %d", handler->pid); } @@ -1595,6 +1608,27 @@ static inline int do_share_ns(void *arg) return 0; } +static int proc_pidfd_open(pid_t pid) +{ + __do_close_prot_errno int proc_pidfd = -EBADF; + char path[100]; + + snprintf(path, sizeof(path), "/proc/%d", pid); + proc_pidfd = open(path, O_DIRECTORY | O_RDONLY | O_CLOEXEC); + if (proc_pidfd < 0) { + SYSERROR("Failed to open %s", path); + return -1; + } + + /* Test whether we can send signals. */ + if (lxc_raw_pidfd_send_signal(proc_pidfd, 0, NULL, 0)) { + SYSERROR("Failed to send signal through pidfd"); + return -1; + } + + return move_fd(proc_pidfd); +} + /* lxc_spawn() performs crucial setup tasks and clone()s the new process which * exec()s the requested container binary. * Note that lxc_spawn() runs in the parent namespaces. Any operations performed @@ -1722,6 +1756,10 @@ static int lxc_spawn(struct lxc_handler *handler) } TRACE("Cloned child process %d", handler->pid); + handler->proc_pidfd = proc_pidfd_open(handler->pid); + if (handler->proc_pidfd < 0 && (errno != ENOSYS)) + goto out_delete_net; + for (i = 0; i < LXC_NS_MAX; i++) if (handler->ns_on_clone_flags & ns_info[i].clone_flag) INFO("Cloned %s", ns_info[i].flag_name); diff --git a/src/lxc/start.h b/src/lxc/start.h index 60607ccc12..305782f272 100644 --- a/src/lxc/start.h +++ b/src/lxc/start.h @@ -102,6 +102,12 @@ struct lxc_handler { /* The child's pid. */ pid_t pid; + /* + * File descriptor for the /proc/ directory of the container's + * init process. + */ + int proc_pidfd; + /* The monitor's pid. */ pid_t monitor_pid; From noreply at github.com Mon May 6 18:36:53 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Mon, 06 May 2019 11:36:53 -0700 Subject: [lxc-devel] [lxc/lxc] d9bb2f: raw_syscalls: add initial support for pidfd_send_s... Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: d9bb2fbab6c0f9a502408a4aa9bfbd9951c1b568 https://github.com/lxc/lxc/commit/d9bb2fbab6c0f9a502408a4aa9bfbd9951c1b568 Author: Christian Brauner Date: 2019-05-06 (Mon, 06 May 2019) Changed paths: M src/lxc/raw_syscalls.c M src/lxc/raw_syscalls.h M src/lxc/start.c M src/lxc/start.h Log Message: ----------- raw_syscalls: add initial support for pidfd_send_signal() Well, I added this syscall so we better use it. :) Signed-off-by: Christian Brauner Commit: 7e30d659c314da29dece4e4c226f3884f7d80c5e https://github.com/lxc/lxc/commit/7e30d659c314da29dece4e4c226f3884f7d80c5e Author: Stéphane Graber Date: 2019-05-06 (Mon, 06 May 2019) Changed paths: M src/lxc/raw_syscalls.c M src/lxc/raw_syscalls.h M src/lxc/start.c M src/lxc/start.h Log Message: ----------- Merge pull request #2977 from brauner/2019-05-06/pidfd_send_signal raw_syscalls: add initial support for pidfd_send_signal() Compare: https://github.com/lxc/lxc/compare/192023dd5ac8...7e30d659c314 From noreply at github.com Mon May 6 19:10:22 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Mon, 06 May 2019 12:10:22 -0700 Subject: [lxc-devel] [lxc/lxc] 18847d: seccomp: document path calculation Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 18847d37dda145539a28c1dea291af03ec810163 https://github.com/lxc/lxc/commit/18847d37dda145539a28c1dea291af03ec810163 Author: Christian Brauner Date: 2019-05-06 (Mon, 06 May 2019) Changed paths: M src/lxc/seccomp.c Log Message: ----------- seccomp: document path calculation Signed-off-by: Christian Brauner Commit: 19a503200d1bed116e08a9ea17101487e86157cd https://github.com/lxc/lxc/commit/19a503200d1bed116e08a9ea17101487e86157cd Author: Stéphane Graber Date: 2019-05-06 (Mon, 06 May 2019) Changed paths: M src/lxc/seccomp.c Log Message: ----------- Merge pull request #2976 from brauner/2019-05-06/bugfixes seccomp: document path calculation Compare: https://github.com/lxc/lxc/compare/7e30d659c314...19a503200d1b From lxc-bot at linuxcontainers.org Mon May 6 22:42:35 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Mon, 06 May 2019 15:42:35 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/containers: Fix bad operation type Message-ID: <5cd0b85b.1c69fb81.f072f.a5a2SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 354 bytes Desc: not available URL: -------------- next part -------------- From 22366b9bcbf124b6f36bab7055e05efc47f2a8e8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Mon, 6 May 2019 18:42:06 -0400 Subject: [PATCH] lxd/containers: Fix bad operation type MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/container_put.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lxd/container_put.go b/lxd/container_put.go index 90a12b7d5e..388413746f 100644 --- a/lxd/container_put.go +++ b/lxd/container_put.go @@ -80,7 +80,7 @@ func containerPut(d *Daemon, r *http.Request) Response { return nil } - opType = db.OperationSnapshotUpdate + opType = db.OperationContainerUpdate } else { // Snapshot Restore do = func(op *operation) error { From lxc-bot at linuxcontainers.org Tue May 7 06:25:35 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Mon, 06 May 2019 23:25:35 -0700 (PDT) Subject: [lxc-devel] [lxd/master] Fix snapshots on CEPH Message-ID: <5cd124df.1c69fb81.77f10.ae0bSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- Fix snapshots on CEPH by stgraber · Pull Request #5729 · lxc/lxd · GitHub
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix snapshots on CEPH #5729

Open
wants to merge 3 commits into
base: master
from

Conversation

1 participant
@stgraber
Copy link
Member

commented May 7, 2019

No description provided.

stgraber added some commits May 7, 2019

lxd/storage/ceph: Fix snapshot of running xfs/btrfs
Signed-off-by: Stéphane Graber <stgraber at ubuntu.com>
lxd/containers: Don't needlessly mount snapshots
Signed-off-by: Stéphane Graber <stgraber at ubuntu.com>
lxd/containers: Avoid costly storage calls during snapshot
Signed-off-by: Stéphane Graber <stgraber at ubuntu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
From lxc-bot at linuxcontainers.org Tue May 7 10:43:31 2019 From: lxc-bot at linuxcontainers.org (monstermunchkin on Github) Date: Tue, 07 May 2019 03:43:31 -0700 (PDT) Subject: [lxc-devel] [distrobuilder/master] sources: Use fallback openSUSE image if needed Message-ID: <5cd16153.1c69fb81.aaebd.466eSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 364 bytes Desc: not available URL: -------------- next part -------------- From 2dd2a9a84697fec25e8bbdc9b7100a5fa45fcdb5 Mon Sep 17 00:00:00 2001 From: Thomas Hipp Date: Tue, 7 May 2019 12:18:30 +0200 Subject: [PATCH] sources: Use fallback openSUSE image if needed Signed-off-by: Thomas Hipp --- sources/opensuse-http.go | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/sources/opensuse-http.go b/sources/opensuse-http.go index bac65aa..0889ef2 100644 --- a/sources/opensuse-http.go +++ b/sources/opensuse-http.go @@ -11,6 +11,7 @@ import ( "path" "path/filepath" "regexp" + "sort" "strings" lxd "github.com/lxc/lxd/shared" @@ -166,14 +167,27 @@ func (s *OpenSUSEHTTP) getTarballName(u *url.URL, release, arch string) string { nodes := htmlquery.Find(doc, `//a/@href`) re := regexp.MustCompile(fmt.Sprintf("^opensuse-%s-image.*%s.*\\.tar.xz$", release, arch)) + var builds []string + for _, n := range nodes { text := htmlquery.InnerText(n) - if !re.MatchString(text) || strings.Contains(text, "Build") { + if !re.MatchString(text) { continue } - return text + if strings.Contains(text, "Build") { + builds = append(builds, text) + } else { + return text + } + } + + if len(builds) > 0 { + // Unfortunately, the link to the latest build is missing, hence we need + // to manually select the latest build. + sort.Strings(builds) + return builds[len(builds)-1] } return "" From lxc-bot at linuxcontainers.org Tue May 7 11:15:35 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Tue, 07 May 2019 04:15:35 -0700 (PDT) Subject: [lxc-devel] [lxc/master] network: Adds custom mtu support for ipvlan interfaces Message-ID: <5cd168d7.1c69fb81.f7a40.cdfaSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 357 bytes Desc: not available URL: -------------- next part -------------- From 006e135e225847ec29eb816c62ac6c22668de4d8 Mon Sep 17 00:00:00 2001 From: tomponline Date: Tue, 7 May 2019 12:13:46 +0100 Subject: [PATCH] network: Adds custom mtu support for ipvlan interfaces Signed-off-by: tomponline --- src/lxc/network.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/src/lxc/network.c b/src/lxc/network.c index 5edd822b48..7214a09b35 100644 --- a/src/lxc/network.c +++ b/src/lxc/network.c @@ -478,6 +478,7 @@ static int instantiate_ipvlan(struct lxc_handler *handler, struct lxc_netdev *ne { char peerbuf[IFNAMSIZ], *peer; int err; + unsigned int mtu = 0; if (netdev->link[0] == '\0') { ERROR("No link for ipvlan network device specified"); @@ -504,6 +505,22 @@ static int instantiate_ipvlan(struct lxc_handler *handler, struct lxc_netdev *ne goto on_error; } + if (netdev->mtu) { + err = lxc_safe_uint(netdev->mtu, &mtu); + if (err < 0) { + errno = -err; + SYSERROR("Failed to parse mtu \"%s\" for interface \"%s\"", netdev->mtu, peer); + goto on_error; + } + + err = lxc_netdev_set_mtu(peer, mtu); + if (err < 0) { + errno = -err; + SYSERROR("Failed to set mtu \"%s\" for interface \"%s\"", netdev->mtu, peer); + goto on_error; + } + } + if (netdev->upscript) { char *argv[] = { "ipvlan", From lxc-bot at linuxcontainers.org Tue May 7 11:36:37 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Tue, 07 May 2019 04:36:37 -0700 (PDT) Subject: [lxc-devel] [lxc/master] network: Makes vlan network interfaces set mtu before upscript called Message-ID: <5cd16dc5.1c69fb81.59ad0.f918SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 551 bytes Desc: not available URL: -------------- next part -------------- From 3e2a7b083b9eb472f95ba8ed45313745f5a10f0f Mon Sep 17 00:00:00 2001 From: tomponline Date: Tue, 7 May 2019 12:34:34 +0100 Subject: [PATCH] network: Makes vlan network interfaces set mtu before upscript called This is consistent with veth and ipvlan types. Also makes the debug message for success occur after up script has run. Also makes device clean up on error more thorough and consistent. Signed-off-by: tomponline --- src/lxc/network.c | 43 ++++++++++++++++++++++--------------------- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/src/lxc/network.c b/src/lxc/network.c index 5edd822b48..8a0a0439eb 100644 --- a/src/lxc/network.c +++ b/src/lxc/network.c @@ -555,8 +555,23 @@ static int instantiate_vlan(struct lxc_handler *handler, struct lxc_netdev *netd netdev->ifindex = if_nametoindex(peer); if (!netdev->ifindex) { ERROR("Failed to retrieve ifindex for \"%s\"", peer); - lxc_netdev_delete_by_name(peer); - return -1; + goto on_error; + } + + if (netdev->mtu) { + err = lxc_safe_uint(netdev->mtu, &mtu); + if (err < 0) { + errno = -err; + SYSERROR("Failed to parse mtu \"%s\" for interface \"%s\"", netdev->mtu, peer); + goto on_error; + } + + err = lxc_netdev_set_mtu(peer, mtu); + if (err) { + errno = -err; + SYSERROR("Failed to set mtu \"%s\" for interface \"%s\"", netdev->mtu, peer); + goto on_error; + } } if (netdev->upscript) { @@ -570,32 +585,18 @@ static int instantiate_vlan(struct lxc_handler *handler, struct lxc_netdev *netd handler->conf->hooks_version, "net", netdev->upscript, "up", argv); if (err < 0) { - lxc_netdev_delete_by_name(peer); - return -1; + goto on_error; } } DEBUG("Instantiated vlan \"%s\" with ifindex is \"%d\" (vlan1000)", peer, netdev->ifindex); - if (netdev->mtu) { - if (lxc_safe_uint(netdev->mtu, &mtu) < 0) { - ERROR("Failed to retrieve mtu from \"%d\"/\"%s\".", - netdev->ifindex, - netdev->name[0] != '\0' ? netdev->name : "(null)"); - return -1; - } - - err = lxc_netdev_set_mtu(peer, mtu); - if (err) { - errno = -err; - SYSERROR("Failed to set mtu \"%s\" for \"%s\"", - netdev->mtu, peer); - lxc_netdev_delete_by_name(peer); - return -1; - } - } return 0; + +on_error: + lxc_netdev_delete_by_name(peer); + return -1; } static int instantiate_phys(struct lxc_handler *handler, struct lxc_netdev *netdev) From noreply at github.com Tue May 7 11:37:40 2019 From: noreply at github.com (Christian Brauner) Date: Tue, 07 May 2019 04:37:40 -0700 Subject: [lxc-devel] [lxc/lxc] 006e13: network: Adds custom mtu support for ipvlan interf... Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 006e135e225847ec29eb816c62ac6c22668de4d8 https://github.com/lxc/lxc/commit/006e135e225847ec29eb816c62ac6c22668de4d8 Author: tomponline Date: 2019-05-07 (Tue, 07 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Adds custom mtu support for ipvlan interfaces Signed-off-by: tomponline Commit: 2c07c966f9e5c0dcd011cb006757068c967981ec https://github.com/lxc/lxc/commit/2c07c966f9e5c0dcd011cb006757068c967981ec Author: Christian Brauner Date: 2019-05-07 (Tue, 07 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- Merge pull request #2978 from tomponline/tp-ipvlan-mtu network: Adds custom mtu support for ipvlan interfaces Compare: https://github.com/lxc/lxc/compare/19a503200d1b...2c07c966f9e5 From noreply at github.com Tue May 7 12:03:54 2019 From: noreply at github.com (Christian Brauner) Date: Tue, 07 May 2019 05:03:54 -0700 Subject: [lxc-devel] [lxc/lxc] 3e2a7b: network: Makes vlan network interfaces set mtu bef... Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 3e2a7b083b9eb472f95ba8ed45313745f5a10f0f https://github.com/lxc/lxc/commit/3e2a7b083b9eb472f95ba8ed45313745f5a10f0f Author: tomponline Date: 2019-05-07 (Tue, 07 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Makes vlan network interfaces set mtu before upscript called This is consistent with veth and ipvlan types. Also makes the debug message for success occur after up script has run. Also makes device clean up on error more thorough and consistent. Signed-off-by: tomponline Commit: 1732294cabec49ebc494bd8805f56e6a8fa2f75f https://github.com/lxc/lxc/commit/1732294cabec49ebc494bd8805f56e6a8fa2f75f Author: Christian Brauner Date: 2019-05-07 (Tue, 07 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- Merge pull request #2979 from tomponline/tp-vlan-mtu network: Makes vlan network interfaces set mtu before upscript called Compare: https://github.com/lxc/lxc/compare/2c07c966f9e5...1732294cabec From lxc-bot at linuxcontainers.org Tue May 7 13:24:39 2019 From: lxc-bot at linuxcontainers.org (Rachid-Koucha on Github) Date: Tue, 07 May 2019 06:24:39 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Devices created in rootfs instead of rootfs/dev Message-ID: <5cd18717.1c69fb81.7df97.6714SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From 738cd316e32084a3adb47b47a5d972150fc91fa3 Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Tue, 7 May 2019 15:23:26 +0200 Subject: [PATCH] devices created in rootfs instead of rootfs/dev --- templates/lxc-busybox.in | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/templates/lxc-busybox.in b/templates/lxc-busybox.in index 3782687a54..7f93ee4077 100644 --- a/templates/lxc-busybox.in +++ b/templates/lxc-busybox.in @@ -89,15 +89,15 @@ ${rootfs}/usr/lib64" echo "lxc.mount.entry = /dev/${dev} dev/${dev} none bind,optional,create=file 0 0" >> "${path}/config" done else - mknod -m 666 "${rootfs}/tty" c 5 0 || res=1 - mknod -m 666 "${rootfs}/console" c 5 1 || res=1 - mknod -m 666 "${rootfs}/tty0" c 4 0 || res=1 - mknod -m 666 "${rootfs}/tty1" c 4 0 || res=1 - mknod -m 666 "${rootfs}/tty5" c 4 0 || res=1 - mknod -m 600 "${rootfs}/ram0" b 1 0 || res=1 - mknod -m 666 "${rootfs}/null" c 1 3 || res=1 - mknod -m 666 "${rootfs}/zero" c 1 5 || res=1 - mknod -m 666 "${rootfs}/urandom" c 1 9 || res=1 + mknod -m 666 "${rootfs}/dev/tty" c 5 0 || res=1 + mknod -m 666 "${rootfs}/dev/console" c 5 1 || res=1 + mknod -m 666 "${rootfs}/dev/tty0" c 4 0 || res=1 + mknod -m 666 "${rootfs}/dev/tty1" c 4 0 || res=1 + mknod -m 666 "${rootfs}/dev/tty5" c 4 0 || res=1 + mknod -m 600 "${rootfs}/dev/ram0" b 1 0 || res=1 + mknod -m 666 "${rootfs}/dev/null" c 1 3 || res=1 + mknod -m 666 "${rootfs}/dev/zero" c 1 5 || res=1 + mknod -m 666 "${rootfs}/dev/urandom" c 1 9 || res=1 fi # root user defined From lxc-bot at linuxcontainers.org Tue May 7 13:24:53 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Tue, 07 May 2019 06:24:53 -0700 (PDT) Subject: [lxc-devel] [lxc/master] network: Re-works veth gateway logic Message-ID: <5cd18725.1c69fb81.286e2.03c5SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 413 bytes Desc: not available URL: -------------- next part -------------- From 906f0e216e15ec6ea8437e139de1c6f7fcd3b09c Mon Sep 17 00:00:00 2001 From: tomponline Date: Tue, 7 May 2019 14:23:24 +0100 Subject: [PATCH] network: Re-works veth gateway logic Handles more errors and gives better error messages. Signed-off-by: tomponline --- src/lxc/network.c | 70 +++++++++++++++++++++++++---------------------- 1 file changed, 37 insertions(+), 33 deletions(-) diff --git a/src/lxc/network.c b/src/lxc/network.c index 09240b15c1..7d78530fa1 100644 --- a/src/lxc/network.c +++ b/src/lxc/network.c @@ -3377,6 +3377,7 @@ static int lxc_setup_netdev_in_child_namespaces(struct lxc_netdev *netdev) int err; const char *net_type_name; char *current_ifname = ifname; + char bufinet4[INET_ADDRSTRLEN], bufinet6[INET6_ADDRSTRLEN]; /* empty network namespace */ if (!netdev->ifindex) { @@ -3501,11 +3502,6 @@ static int lxc_setup_netdev_in_child_namespaces(struct lxc_netdev *netdev) } } - /* We can only set up the default routes after bringing - * up the interface, since bringing up the interface adds - * the link-local routes and we can't add a default - * route if the gateway is not reachable. */ - /* setup ipv4 gateway on the interface */ if (netdev->ipv4_gateway || netdev->ipv4_gateway_dev) { if (!(netdev->flags & IFF_UP)) { @@ -3529,26 +3525,31 @@ static int lxc_setup_netdev_in_child_namespaces(struct lxc_netdev *netdev) return minus_one_set_errno(-err); } } else { + /* Check the gateway address is valid */ + if (!inet_ntop(AF_INET, netdev->ipv4_gateway, bufinet4, sizeof(bufinet4))) + return minus_one_set_errno(-errno); + + /* Try adding a default route to the gateway address */ err = lxc_ipv4_gateway_add(netdev->ifindex, netdev->ipv4_gateway); - if (err) { + if (err < 0) { + /* If adding the default route fails, this could be because the + * gateway address is in a different subnet to the container's address. + * To work around this, we try adding a static device route to the + * gateway address first, and then try again. + */ err = lxc_ipv4_dest_add(netdev->ifindex, netdev->ipv4_gateway, 32); - if (err) { + if (err < 0) { errno = -err; - SYSERROR("Failed to add ipv4 dest for network device \"%s\"", - ifname); + SYSERROR("Failed to add ipv4 dest \"%s\" for network device \"%s\"", + bufinet4, ifname); + return -1; } err = lxc_ipv4_gateway_add(netdev->ifindex, netdev->ipv4_gateway); - if (err) { + if (err < 0) { errno = -err; - SYSERROR("Failed to setup ipv4 gateway for network device \"%s\"", - ifname); - - if (netdev->ipv4_gateway_auto) { - char buf[INET_ADDRSTRLEN]; - inet_ntop(AF_INET, netdev->ipv4_gateway, buf, sizeof(buf)); - ERROR("Tried to set autodetected ipv4 gateway \"%s\"", buf); - } + SYSERROR("Failed to setup ipv4 gateway \"%s\" for network device \"%s\"", + bufinet4, ifname); return -1; } } @@ -3578,28 +3579,31 @@ static int lxc_setup_netdev_in_child_namespaces(struct lxc_netdev *netdev) return minus_one_set_errno(-err); } } else { + /* Check the gateway address is valid */ + if (!inet_ntop(AF_INET6, netdev->ipv6_gateway, bufinet6, sizeof(bufinet6))) + return minus_one_set_errno(-errno); + + /* Try adding a default route to the gateway address */ err = lxc_ipv6_gateway_add(netdev->ifindex, netdev->ipv6_gateway); - if (err) { + if (err < 0) { + /* If adding the default route fails, this could be because the + * gateway address is in a different subnet to the container's address. + * To work around this, we try adding a static device route to the + * gateway address first, and then try again. + */ err = lxc_ipv6_dest_add(netdev->ifindex, netdev->ipv6_gateway, 128); - if (err) { + if (err < 0) { errno = -err; - SYSERROR("Failed to add ipv6 dest for network device \"%s\"", - ifname); + SYSERROR("Failed to add ipv6 dest \"%s\" for network device \"%s\"", + bufinet6, ifname); + return -1; } err = lxc_ipv6_gateway_add(netdev->ifindex, netdev->ipv6_gateway); - if (err) { + if (err < 0) { errno = -err; - SYSERROR("Failed to setup ipv6 gateway for network device \"%s\"", - ifname); - - if (netdev->ipv6_gateway_auto) { - char buf[INET6_ADDRSTRLEN]; - inet_ntop(AF_INET6, netdev->ipv6_gateway, buf, sizeof(buf)); - ERROR("Tried to set autodetected ipv6 " - "gateway for network device " - "\"%s\"", buf); - } + SYSERROR("Failed to setup ipv6 gateway \"%s\" for network device \"%s\"", + bufinet6, ifname); return -1; } } From noreply at github.com Tue May 7 13:50:46 2019 From: noreply at github.com (Christian Brauner) Date: Tue, 07 May 2019 13:50:46 +0000 (UTC) Subject: [lxc-devel] [lxc/lxc] 009d61: network: Re-works veth gateway logic Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 009d6127485ff454543b333e05ef46910a5573f8 https://github.com/lxc/lxc/commit/009d6127485ff454543b333e05ef46910a5573f8 Author: tomponline Date: 2019-05-07 (Tue, 07 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Re-works veth gateway logic Handles more errors and gives better error messages. Signed-off-by: tomponline Commit: 668084bb25fdf611b6f927510426500148fadff1 https://github.com/lxc/lxc/commit/668084bb25fdf611b6f927510426500148fadff1 Author: Christian Brauner Date: 2019-05-07 (Tue, 07 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- Merge pull request #2981 from tomponline/tp-veth-gateway network: Re-works veth gateway logic Compare: https://github.com/lxc/lxc/compare/1732294cabec...668084bb25fd From lxc-bot at linuxcontainers.org Tue May 7 14:03:15 2019 From: lxc-bot at linuxcontainers.org (Rachid-Koucha on Github) Date: Tue, 07 May 2019 07:03:15 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Devices created in rootfs instead of rootfs/dev Message-ID: <5cd19023.1c69fb81.7c0f4.6f1fSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 392 bytes Desc: not available URL: -------------- next part -------------- From 28eb86bd4391b5cebe0b92ceea70eb5e57c92285 Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Tue, 7 May 2019 16:03:02 +0200 Subject: [PATCH] Devices created in rootfs instead of rootfs/dev Added /dev in the mknod commands. Signed-off-by: Rachid Koucha --- templates/lxc-busybox.in | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/templates/lxc-busybox.in b/templates/lxc-busybox.in index 3782687a54..7f93ee4077 100644 --- a/templates/lxc-busybox.in +++ b/templates/lxc-busybox.in @@ -89,15 +89,15 @@ ${rootfs}/usr/lib64" echo "lxc.mount.entry = /dev/${dev} dev/${dev} none bind,optional,create=file 0 0" >> "${path}/config" done else - mknod -m 666 "${rootfs}/tty" c 5 0 || res=1 - mknod -m 666 "${rootfs}/console" c 5 1 || res=1 - mknod -m 666 "${rootfs}/tty0" c 4 0 || res=1 - mknod -m 666 "${rootfs}/tty1" c 4 0 || res=1 - mknod -m 666 "${rootfs}/tty5" c 4 0 || res=1 - mknod -m 600 "${rootfs}/ram0" b 1 0 || res=1 - mknod -m 666 "${rootfs}/null" c 1 3 || res=1 - mknod -m 666 "${rootfs}/zero" c 1 5 || res=1 - mknod -m 666 "${rootfs}/urandom" c 1 9 || res=1 + mknod -m 666 "${rootfs}/dev/tty" c 5 0 || res=1 + mknod -m 666 "${rootfs}/dev/console" c 5 1 || res=1 + mknod -m 666 "${rootfs}/dev/tty0" c 4 0 || res=1 + mknod -m 666 "${rootfs}/dev/tty1" c 4 0 || res=1 + mknod -m 666 "${rootfs}/dev/tty5" c 4 0 || res=1 + mknod -m 600 "${rootfs}/dev/ram0" b 1 0 || res=1 + mknod -m 666 "${rootfs}/dev/null" c 1 3 || res=1 + mknod -m 666 "${rootfs}/dev/zero" c 1 5 || res=1 + mknod -m 666 "${rootfs}/dev/urandom" c 1 9 || res=1 fi # root user defined From noreply at github.com Tue May 7 14:14:53 2019 From: noreply at github.com (Christian Brauner) Date: Tue, 07 May 2019 07:14:53 -0700 Subject: [lxc-devel] [lxc/lxc] 28eb86: Devices created in rootfs instead of rootfs/dev Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 28eb86bd4391b5cebe0b92ceea70eb5e57c92285 https://github.com/lxc/lxc/commit/28eb86bd4391b5cebe0b92ceea70eb5e57c92285 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-07 (Tue, 07 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Devices created in rootfs instead of rootfs/dev Added /dev in the mknod commands. Signed-off-by: Rachid Koucha Commit: b1045fd37bf5c30160394e81ff97f206cbc79465 https://github.com/lxc/lxc/commit/b1045fd37bf5c30160394e81ff97f206cbc79465 Author: Christian Brauner Date: 2019-05-07 (Tue, 07 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Merge pull request #2982 from Rachid-Koucha/patch-5 Devices created in rootfs instead of rootfs/dev Compare: https://github.com/lxc/lxc/compare/668084bb25fd...b1045fd37bf5 From lxc-bot at linuxcontainers.org Tue May 7 18:37:59 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Tue, 07 May 2019 11:37:59 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/storage/ceph: Only rewrite UUID once Message-ID: <5cd1d087.1c69fb81.808e1.e4f9SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 354 bytes Desc: not available URL: -------------- next part -------------- From d6c693be5a5ad0baae947c0590f40e4f1312c32b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Tue, 7 May 2019 14:37:26 -0400 Subject: [PATCH] lxd/storage/ceph: Only rewrite UUID once MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/storage_ceph.go | 6 ------ lxd/storage_ceph_utils.go | 7 +++++++ 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/lxd/storage_ceph.go b/lxd/storage_ceph.go index 61563d3aa8..f90c9ba304 100644 --- a/lxd/storage_ceph.go +++ b/lxd/storage_ceph.go @@ -1159,12 +1159,6 @@ func (s *storageCeph) ContainerCopy(target container, source container, return err } - // Re-generate the UUID - err := s.cephRBDGenerateUUID(projectPrefix(target.Project(), target.Name()), storagePoolVolumeTypeNameContainer) - if err != nil { - return err - } - logger.Debugf(`Copied RBD container storage %s to %s`, sourceContainerName, target.Name()) return nil diff --git a/lxd/storage_ceph_utils.go b/lxd/storage_ceph_utils.go index abe9bdc4af..4f6889af9b 100644 --- a/lxd/storage_ceph_utils.go +++ b/lxd/storage_ceph_utils.go @@ -740,6 +740,13 @@ func (s *storageCeph) copyWithoutSnapshotsFull(target container, return err } + // Re-generate the UUID + err = s.cephRBDGenerateUUID(projectPrefix(target.Project(), target.Name()), storagePoolVolumeTypeNameContainer) + if err != nil { + return err + } + + // Create mountpoint targetContainerMountPoint := getContainerMountPoint(target.Project(), s.pool.Name, target.Name()) err = createContainerMountpoint(targetContainerMountPoint, target.Path(), target.IsPrivileged()) if err != nil { From lxc-bot at linuxcontainers.org Tue May 7 19:10:15 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Tue, 07 May 2019 12:10:15 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/seccomp: Minimal seccomp server Message-ID: <5cd1d817.1c69fb81.98abc.fd07SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 354 bytes Desc: not available URL: -------------- next part -------------- From f7db4bf41dfa03ee00a20407267d11cf63d508d7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Tue, 7 May 2019 15:09:49 -0400 Subject: [PATCH] lxd/containers: Don't fail on old libseccomp MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/container_lxc.go | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 249a33d382..508a8db697 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -1812,10 +1812,10 @@ func (c *containerLXC) initLXC(config bool) error { } if !c.IsPrivileged() && !c.state.OS.RunningInUserNS && lxc.HasApiExtension("seccomp_notify") && c.DaemonState().OS.SeccompListener { - err = lxcSetConfigItem(cc, "lxc.seccomp.notify.proxy", fmt.Sprintf("unix:%s", shared.VarPath("seccomp.socket"))) - if err != nil { - return err - } + // NOTE: Don't fail in cases where liblxc is recent enough but libseccomp isn't + // when we add mount() support with user-configurable + // options, we will want a hard fail if the user configured it + lxcSetConfigItem(cc, "lxc.seccomp.notify.proxy", fmt.Sprintf("unix:%s", shared.VarPath("seccomp.socket"))) } // Apply raw.lxc From lxc-bot at linuxcontainers.org Tue May 7 20:24:28 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Tue, 07 May 2019 13:24:28 -0700 (PDT) Subject: [lxc-devel] [lxd/master] DEBUG: clustering Message-ID: <5cd1e97c.1c69fb81.1d68a.3675SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 354 bytes Desc: not available URL: -------------- next part -------------- From b524a268b35ea08899a3e073ad704504b225a674 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Tue, 7 May 2019 16:12:00 -0400 Subject: [PATCH] DEBUG: clustering MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/cluster/heartbeat.go | 16 +++++ test/main.sh | 136 +++++++++++++++++++-------------------- 2 files changed, 84 insertions(+), 68 deletions(-) diff --git a/lxd/cluster/heartbeat.go b/lxd/cluster/heartbeat.go index 980f8c7018..c84debd651 100644 --- a/lxd/cluster/heartbeat.go +++ b/lxd/cluster/heartbeat.go @@ -24,14 +24,24 @@ import ( // It will update the heartbeat timestamp column of the nodes table // accordingly, and also notify them of the current list of database nodes. func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) { + serverName := "unknown" + cluster.Transaction(func(tx *db.ClusterTx) error { + var err error + serverName, err = tx.NodeName() + return err + }) + heartbeat := func(ctx context.Context) { + logger.Errorf("[%s] stgraber: in hearbeat", serverName) if gateway.server == nil || gateway.memoryDial != nil { // We're not a raft node or we're not clustered + logger.Errorf("[%s] stgraber: not clustered => out", serverName) return } raftNodes, err := gateway.currentRaftNodes() if err == raft.ErrNotLeader { + logger.Errorf("[%s] stgraber: not the leader => out", serverName) return } logger.Debugf("Starting heartbeat round") @@ -52,6 +62,7 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) logger.Warnf("Failed to replace local raft nodes: %v", err) return } + logger.Errorf("[%s] stgraber: set raft nodes to: %+v", serverName, raftNodes) var nodes []db.NodeInfo var nodeAddress string // Address of this node @@ -71,6 +82,7 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) logger.Warnf("Failed to get current cluster nodes: %v", err) return } + logger.Errorf("[%s] stgraber: local=%v nodes=%v", serverName, nodeAddress, nodes) heartbeats := make([]time.Time, len(nodes)) heartbeatsLock := sync.Mutex{} @@ -93,6 +105,7 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) // Spread in time by waiting up to 3s less than the interval time.Sleep(time.Duration(rand.Intn((heartbeatInterval*1000)-3000)) * time.Millisecond) logger.Debugf("Sending heartbeat to %s", address) + logger.Errorf("[%s] stgraber: Contacting: %s", serverName, address) err := heartbeatNode(ctx, address, gateway.cert, raftNodes) if err == nil { @@ -103,9 +116,11 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) } else { logger.Debugf("Failed heartbeat for %s: %v", address, err) } + logger.Errorf("[%s] stgraber: Got reply from: %s", serverName, address) }(i, node.Address) } heartbeatsWg.Wait() + logger.Errorf("[%s] stgraber: Got all replies", serverName) // If the context has been cancelled, return immediately. if ctx.Err() != nil { @@ -130,6 +145,7 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) logger.Warnf("Failed to update heartbeat: %v", err) } logger.Debugf("Completed heartbeat round") + logger.Errorf("[%s] stgraber: done => out", serverName) } // Since the database APIs are blocking we need to wrap the core logic diff --git a/test/main.sh b/test/main.sh index 6d245c431c..f51357c75e 100755 --- a/test/main.sh +++ b/test/main.sh @@ -154,74 +154,74 @@ if [ "$#" -gt 0 ]; then exit fi -run_test test_check_deps "checking dependencies" -run_test test_static_analysis "static analysis" -run_test test_database_update "database schema updates" -run_test test_database_restore "database restore" -run_test test_sql "lxd sql" -run_test test_projects_default "default project" -run_test test_projects_crud "projects CRUD operations" -run_test test_projects_containers "containers inside projects" -run_test test_projects_snapshots "snapshots inside projects" -run_test test_projects_backups "backups inside projects" -run_test test_projects_profiles "profiles inside projects" -run_test test_projects_profiles_default "profiles from the global default project" -run_test test_projects_images "images inside projects" -run_test test_projects_images_default "images from the global default project" -run_test test_projects_storage "projects and storage pools" -run_test test_projects_network "projects and networks" -run_test test_remote_url "remote url handling" -run_test test_remote_admin "remote administration" -run_test test_remote_usage "remote usage" -run_test test_basic_usage "basic usage" -run_test test_container_devices_nic "container devices - nic" -run_test test_security "security features" -run_test test_security_protection "container protection" -run_test test_image_expiry "image expiry" -run_test test_image_list_all_aliases "image list all aliases" -run_test test_image_auto_update "image auto-update" -run_test test_image_prefer_cached "image prefer cached" -run_test test_image_import_dir "import image from directory" -run_test test_concurrent_exec "concurrent exec" -run_test test_concurrent "concurrent startup" -run_test test_snapshots "container snapshots" -run_test test_snap_restore "snapshot restores" -run_test test_snap_expiry "snapshot expiry" -run_test test_config_profiles "profiles and configuration" -run_test test_config_edit "container configuration edit" -run_test test_config_edit_container_snapshot_pool_config "container and snapshot volume configuration edit" -run_test test_container_metadata "manage container metadata and templates" -run_test test_container_snapshot_config "container snapshot configuration" -run_test test_server_config "server configuration" -run_test test_filemanip "file manipulations" -run_test test_network "network management" -run_test test_idmap "id mapping" -run_test test_template "file templating" -run_test test_pki "PKI mode" -run_test test_devlxd "/dev/lxd" -run_test test_fuidshift "fuidshift" -run_test test_migration "migration" -run_test test_fdleak "fd leak" -run_test test_storage "storage" -run_test test_storage_volume_snapshots "storage volume snapshots" -run_test test_init_auto "lxd init auto" -run_test test_init_interactive "lxd init interactive" -run_test test_init_preseed "lxd init preseed" -run_test test_storage_profiles "storage profiles" -run_test test_container_import "container import" -run_test test_storage_volume_attach "attaching storage volumes" -run_test test_storage_driver_ceph "ceph storage driver" -run_test test_resources "resources" -run_test test_kernel_limits "kernel limits" -run_test test_macaroon_auth "macaroon authentication" -run_test test_console "console" -run_test test_query "query" -run_test test_proxy_device "proxy device" -run_test test_storage_local_volume_handling "storage local volume handling" -run_test test_backup_import "backup import" -run_test test_backup_export "backup export" -run_test test_container_local_cross_pool_handling "container local cross pool handling" -run_test test_incremental_copy "incremental container copy" +#run_test test_check_deps "checking dependencies" +#run_test test_static_analysis "static analysis" +#run_test test_database_update "database schema updates" +#run_test test_database_restore "database restore" +#run_test test_sql "lxd sql" +#run_test test_projects_default "default project" +#run_test test_projects_crud "projects CRUD operations" +#run_test test_projects_containers "containers inside projects" +#run_test test_projects_snapshots "snapshots inside projects" +#run_test test_projects_backups "backups inside projects" +#run_test test_projects_profiles "profiles inside projects" +#run_test test_projects_profiles_default "profiles from the global default project" +#run_test test_projects_images "images inside projects" +#run_test test_projects_images_default "images from the global default project" +#run_test test_projects_storage "projects and storage pools" +#run_test test_projects_network "projects and networks" +#run_test test_remote_url "remote url handling" +#run_test test_remote_admin "remote administration" +#run_test test_remote_usage "remote usage" +#run_test test_basic_usage "basic usage" +#run_test test_container_devices_nic "container devices - nic" +#run_test test_security "security features" +#run_test test_security_protection "container protection" +#run_test test_image_expiry "image expiry" +#run_test test_image_list_all_aliases "image list all aliases" +#run_test test_image_auto_update "image auto-update" +#run_test test_image_prefer_cached "image prefer cached" +#run_test test_image_import_dir "import image from directory" +#run_test test_concurrent_exec "concurrent exec" +#run_test test_concurrent "concurrent startup" +#run_test test_snapshots "container snapshots" +#run_test test_snap_restore "snapshot restores" +#run_test test_snap_expiry "snapshot expiry" +#run_test test_config_profiles "profiles and configuration" +#run_test test_config_edit "container configuration edit" +#run_test test_config_edit_container_snapshot_pool_config "container and snapshot volume configuration edit" +#run_test test_container_metadata "manage container metadata and templates" +#run_test test_container_snapshot_config "container snapshot configuration" +#run_test test_server_config "server configuration" +#run_test test_filemanip "file manipulations" +#run_test test_network "network management" +#run_test test_idmap "id mapping" +#run_test test_template "file templating" +#run_test test_pki "PKI mode" +#run_test test_devlxd "/dev/lxd" +#run_test test_fuidshift "fuidshift" +#run_test test_migration "migration" +#run_test test_fdleak "fd leak" +#run_test test_storage "storage" +#run_test test_storage_volume_snapshots "storage volume snapshots" +#run_test test_init_auto "lxd init auto" +#run_test test_init_interactive "lxd init interactive" +#run_test test_init_preseed "lxd init preseed" +#run_test test_storage_profiles "storage profiles" +#run_test test_container_import "container import" +#run_test test_storage_volume_attach "attaching storage volumes" +#run_test test_storage_driver_ceph "ceph storage driver" +#run_test test_resources "resources" +#run_test test_kernel_limits "kernel limits" +#run_test test_macaroon_auth "macaroon authentication" +#run_test test_console "console" +#run_test test_query "query" +#run_test test_proxy_device "proxy device" +#run_test test_storage_local_volume_handling "storage local volume handling" +#run_test test_backup_import "backup import" +#run_test test_backup_export "backup export" +#run_test test_container_local_cross_pool_handling "container local cross pool handling" +#run_test test_incremental_copy "incremental container copy" run_test test_clustering_enable "clustering enable" run_test test_clustering_membership "clustering membership" run_test test_clustering_containers "clustering containers" From lxc-bot at linuxcontainers.org Wed May 8 01:38:28 2019 From: lxc-bot at linuxcontainers.org (cyphar on Github) Date: Tue, 07 May 2019 18:38:28 -0700 (PDT) Subject: [lxc-devel] [lxd/master] shared: fix $SNAP handling under new snappy Message-ID: <5cd23314.1c69fb81.c7ca6.6307SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 549 bytes Desc: not available URL: -------------- next part -------------- From fb38532b0327d31955a367d8c3dc375feebae2c5 Mon Sep 17 00:00:00 2001 From: Aleksa Sarai Date: Wed, 8 May 2019 11:33:57 +1000 Subject: [PATCH] shared: fix $SNAP handling under new snappy The $SNAP environment variable used to signal that LXD is running inside a snap can be empty, which results in HostPath being a no-op inside a snap. Instead we just use os.LookupEnv. Signed-off-by: Aleksa Sarai --- shared/util.go | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/shared/util.go b/shared/util.go index 4f98cbbb26..fb03e2ee4a 100644 --- a/shared/util.go +++ b/shared/util.go @@ -120,9 +120,9 @@ func HostPath(path string) string { } // Check if we're running in a snap package - snap := os.Getenv("SNAP") + _, inSnap := os.LookupEnv("SNAP") snapName := os.Getenv("SNAP_NAME") - if snap == "" || snapName != "lxd" { + if !inSnap || snapName != "lxd" { return path } From lxc-bot at linuxcontainers.org Wed May 8 04:00:18 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Tue, 07 May 2019 21:00:18 -0700 (PDT) Subject: [lxc-devel] [lxd/master] Fix some cluster reliability issues Message-ID: <5cd25452.1c69fb81.ff0b4.fcc5SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From 549526b8ba1e60dfd204b7892c3de200b2ca3c3b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Tue, 7 May 2019 17:14:12 -0400 Subject: [PATCH 1/2] lxd/cluster: Avoid panic in Gateway MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/cluster/gateway.go | 43 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 42 insertions(+), 1 deletion(-) diff --git a/lxd/cluster/gateway.go b/lxd/cluster/gateway.go index 5703da77d5..c34fdd1c0a 100644 --- a/lxd/cluster/gateway.go +++ b/lxd/cluster/gateway.go @@ -11,6 +11,7 @@ import ( "os" "path/filepath" "strconv" + "sync" "time" "github.com/CanonicalLtd/go-dqlite" @@ -99,6 +100,8 @@ type Gateway struct { // ServerStore wrapper. store *dqliteServerStore + + lock sync.RWMutex } // HandlerFuncs returns the HTTP handlers that should be added to the REST API @@ -112,6 +115,9 @@ type Gateway struct { // database node part of the dqlite cluster. func (g *Gateway) HandlerFuncs() map[string]http.HandlerFunc { database := func(w http.ResponseWriter, r *http.Request) { + g.lock.RLock() + defer g.lock.RUnlock() + if !tlsCheckCert(r, g.cert) { http.Error(w, "403 invalid client certificate", http.StatusForbidden) return @@ -202,6 +208,9 @@ func (g *Gateway) HandlerFuncs() map[string]http.HandlerFunc { g.acceptCh <- conn } raft := func(w http.ResponseWriter, r *http.Request) { + g.lock.RLock() + defer g.lock.RUnlock() + // If we are not part of the raft cluster, reply with a // redirect to one of the raft nodes that we know about. if g.raft == nil { @@ -245,6 +254,9 @@ func (g *Gateway) HandlerFuncs() map[string]http.HandlerFunc { // Snapshot can be used to manually trigger a RAFT snapshot func (g *Gateway) Snapshot() error { + g.lock.RLock() + defer g.lock.RUnlock() + return g.raft.Snapshot() } @@ -257,6 +269,9 @@ func (g *Gateway) WaitUpgradeNotification() { // IsDatabaseNode returns true if this gateway also run acts a raft database node. func (g *Gateway) IsDatabaseNode() bool { + g.lock.RLock() + defer g.lock.RUnlock() + return g.raft != nil } @@ -264,6 +279,9 @@ func (g *Gateway) IsDatabaseNode() bool { // dqlite nodes. func (g *Gateway) DialFunc() dqlite.DialFunc { return func(ctx context.Context, address string) (net.Conn, error) { + g.lock.RLock() + defer g.lock.RUnlock() + // Memory connection. if g.memoryDial != nil { return g.memoryDial(ctx, address) @@ -301,12 +319,15 @@ func (g *Gateway) Kill() { func (g *Gateway) Shutdown() error { logger.Debugf("Stop database gateway") + g.lock.RLock() if g.raft != nil { err := g.raft.Shutdown() if err != nil { + g.lock.RUnlock() return errors.Wrap(err, "Failed to shutdown raft") } } + g.lock.RUnlock() if g.server != nil { g.Sync() @@ -314,7 +335,9 @@ func (g *Gateway) Shutdown() error { // Unset the memory dial, since Shutdown() is also called for // switching between in-memory and network mode. + g.lock.Lock() g.memoryDial = nil + g.lock.Unlock() } return nil @@ -325,6 +348,9 @@ func (g *Gateway) Shutdown() error { // it can inspect the database in order to decide whether to activate the // daemon or not. func (g *Gateway) Sync() { + g.lock.RLock() + defer g.lock.RUnlock() + if g.server == nil { return } @@ -362,6 +388,9 @@ func (g *Gateway) Reset(cert *shared.CertInfo) error { // LeaderAddress returns the address of the current raft leader. func (g *Gateway) LeaderAddress() (string, error) { + g.lock.RLock() + defer g.lock.RUnlock() + // If we aren't clustered, return an error. if g.memoryDial != nil { return "", fmt.Errorf("Node is not clustered") @@ -381,7 +410,6 @@ func (g *Gateway) LeaderAddress() (string, error) { time.Sleep(time.Second) } return "", ctx.Err() - } // If this isn't a raft node, contact a raft node and ask for the @@ -483,15 +511,21 @@ func (g *Gateway) init() error { return errors.Wrap(err, "Failed to create dqlite server") } + g.lock.Lock() g.server = server g.raft = raft + g.lock.Unlock() } else { + g.lock.Lock() g.server = nil g.raft = nil g.store.inMemory = nil + g.lock.Unlock() } + g.lock.Lock() g.store.onDisk = dqlite.NewServerStore(g.db.DB(), "main", "raft_nodes", "address") + g.lock.Unlock() return nil } @@ -502,9 +536,13 @@ func (g *Gateway) waitLeadership() error { n := 80 sleep := 250 * time.Millisecond for i := 0; i < n; i++ { + g.lock.RLock() if g.raft.raft.State() == raft.Leader { + g.lock.RUnlock() return nil } + g.lock.RUnlock() + time.Sleep(sleep) } return fmt.Errorf("RAFT node did not self-elect within %s", time.Duration(n)*sleep) @@ -514,6 +552,9 @@ func (g *Gateway) waitLeadership() error { // cluster, as configured in the raft log. It returns an error if this node is // not the leader. func (g *Gateway) currentRaftNodes() ([]db.RaftNode, error) { + g.lock.RLock() + defer g.lock.RUnlock() + if g.raft == nil { return nil, raft.ErrNotLeader } From ac32d0fd891ba21c0c8d6cf5597097be3ffc37c1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Tue, 7 May 2019 23:29:29 -0400 Subject: [PATCH 2/2] lxd/cluster: Use current time for hearbeat MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/cluster/heartbeat.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lxd/cluster/heartbeat.go b/lxd/cluster/heartbeat.go index 980f8c7018..e842787b64 100644 --- a/lxd/cluster/heartbeat.go +++ b/lxd/cluster/heartbeat.go @@ -119,7 +119,7 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) continue } - err := tx.NodeHeartbeat(node.Address, heartbeats[i]) + err := tx.NodeHeartbeat(node.Address, time.Now()) if err != nil { return err } From lxc-bot at linuxcontainers.org Wed May 8 10:16:56 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Wed, 08 May 2019 03:16:56 -0700 (PDT) Subject: [lxc-devel] [lxd/master] network: Fixes custom MTU not being applied Message-ID: <5cd2ac98.1c69fb81.71422.5a93SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 361 bytes Desc: not available URL: -------------- next part -------------- From 547b17c81b32bb96b1ca8428b4f63cca1f9c26b5 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Wed, 8 May 2019 11:13:53 +0100 Subject: [PATCH] network: Fixes custom MTU not being applied Signed-off-by: Thomas Parrott --- lxd/container_lxc.go | 9 +++++++ test/suites/container_devices.sh | 43 +++++++++++++++++++++++++++++--- 2 files changed, 48 insertions(+), 4 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 508a8db697..f99b9e1475 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -7626,6 +7626,15 @@ func (c *containerLXC) createNetworkDevice(name string, m types.Device) (string, } } + // Set the MAC address + if m["mtu"] != "" { + _, err := shared.RunCommand("ip", "link", "set", "dev", dev, "mtu", m["mtu"]) + if err != nil { + deviceRemoveInterface(dev) + return "", fmt.Errorf("Failed to set the MTU: %s", err) + } + } + // Bring the interface up _, err := shared.RunCommand("ip", "link", "set", "dev", dev, "up") if err != nil { diff --git a/test/suites/container_devices.sh b/test/suites/container_devices.sh index 9cb5f958c8..56265a8e7f 100644 --- a/test/suites/container_devices.sh +++ b/test/suites/container_devices.sh @@ -24,6 +24,7 @@ test_container_devices_nic() { lxc profile device set ${ct_name} eth0 limits.ingress 1Mbit lxc profile device set ${ct_name} eth0 limits.egress 2Mbit lxc profile device set ${ct_name} eth0 host_name "${veth_host_name}" + lxc profile device set ${ct_name} eth0 mtu "1400" lxc launch testimage "${ct_name}" -p ${ct_name} if ! ip -4 r list dev "${veth_host_name}" | grep "192.0.2.1${ipRand}" ; then echo "ipv4.routes invalid" @@ -42,6 +43,12 @@ test_container_devices_nic() { false fi + # Check custom MTU is applied. + if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1400" ; then + echo "mtu invalid" + false + fi + # Test hot plugging a container nic with different settings to profile with the same name. lxc config device add "${ct_name}" eth0 nic \ nictype=bridged \ @@ -50,7 +57,9 @@ test_container_devices_nic() { ipv6.routes="2001:db8::2${ipRand}/128" \ limits.ingress=3Mbit \ limits.egress=4Mbit \ - host_name="${veth_host_name}" + host_name="${veth_host_name}" \ + name=eth0 \ + mtu=1401 if ! ip -4 r list dev "${veth_host_name}" | grep "192.0.2.2${ipRand}" ; then echo "ipv4.routes invalid" @@ -69,6 +78,12 @@ test_container_devices_nic() { false fi + # Check custom MTU is applied. + if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1401" ; then + echo "mtu invalid" + false + fi + # Test removing hot plugged device and check profile nic is restored. lxc config device remove "${ct_name}" eth0 if ! ip -4 r list dev "${veth_host_name}" | grep "192.0.2.1${ipRand}" ; then @@ -88,16 +103,23 @@ test_container_devices_nic() { false fi + # Check custom MTU is applied. + if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1400" ; then + echo "mtu invalid" + false + fi + # Test hot plugging a container nic then updating it. lxc config device add "${ct_name}" eth0 nic \ nictype=bridged \ parent=${brName} \ - host_name="${veth_host_name}" + host_name="${veth_host_name}" \ + name=eth0 lxc config device set "${ct_name}" eth0 ipv4.routes "192.0.2.2${ipRand}/32" lxc config device set "${ct_name}" eth0 ipv6.routes "2001:db8::2${ipRand}/128" lxc config device set "${ct_name}" eth0 limits.ingress 3Mbit lxc config device set "${ct_name}" eth0 limits.egress 4Mbit - + lxc config device set "${ct_name}" eth0 mtu 1403 if ! ip -4 r list dev "${veth_host_name}" | grep "192.0.2.2${ipRand}" ; then echo "ipv4.routes invalid" false @@ -115,6 +137,12 @@ test_container_devices_nic() { false fi + # Check custom MTU is applied. + if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1403" ; then + echo "mtu invalid" + false + fi + # Test adding p2p veth to running container. lxc config device add "${ct_name}" eth1 nic \ nictype=p2p \ @@ -122,7 +150,8 @@ test_container_devices_nic() { ipv6.routes="2001:db8::3${ipRand}/128" \ limits.ingress=3Mbit \ limits.egress=4Mbit \ - host_name="${veth_host_name}p2p" + host_name="${veth_host_name}p2p" \ + mtu=1400 if ! ip -4 r list dev "${veth_host_name}p2p" | grep "192.0.2.3${ipRand}" ; then echo "ipv4.routes invalid" @@ -141,6 +170,12 @@ test_container_devices_nic() { false fi + # Check custom MTU is applied. + if ! lxc exec "${ct_name}" -- ip link show eth1 | grep "mtu 1400" ; then + echo "mtu invalid" + false + fi + # Cleanup. lxc config device remove "${ct_name}" eth1 lxc delete "${ct_name}" -f From lxc-bot at linuxcontainers.org Wed May 8 16:48:12 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Wed, 08 May 2019 09:48:12 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/cluster: Fix race condition during join Message-ID: <5cd3084c.1c69fb81.4d30a.3db5SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 354 bytes Desc: not available URL: -------------- next part -------------- From f2275bad5cc74533a3ae22a0f93b2b0f53304392 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 12:13:45 -0400 Subject: [PATCH] lxd/cluster: Fix race condition during join MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/cluster/heartbeat.go | 79 +++++++++++++++++++++++++++++---------- lxd/cluster/membership.go | 6 +++ 2 files changed, 65 insertions(+), 20 deletions(-) diff --git a/lxd/cluster/heartbeat.go b/lxd/cluster/heartbeat.go index e842787b64..709e2fa234 100644 --- a/lxd/cluster/heartbeat.go +++ b/lxd/cluster/heartbeat.go @@ -61,10 +61,12 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) if err != nil { return err } + nodeAddress, err = tx.NodeAddress() if err != nil { return err } + return nil }) if err != nil { @@ -72,38 +74,75 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) return } - heartbeats := make([]time.Time, len(nodes)) + heartbeats := map[int64]bool{} heartbeatsLock := sync.Mutex{} heartbeatsWg := sync.WaitGroup{} - for i, node := range nodes { + sendHeartbeat := func(id int64, address string, delay bool) { + defer heartbeatsWg.Done() + + if delay { + // Spread in time by waiting up to 3s less than the interval + time.Sleep(time.Duration(rand.Intn((heartbeatInterval*1000)-3000)) * time.Millisecond) + } + logger.Debugf("Sending heartbeat to %s", address) + + err := heartbeatNode(ctx, address, gateway.cert, raftNodes) + if err == nil { + heartbeatsLock.Lock() + heartbeats[id] = true + heartbeatsLock.Unlock() + logger.Debugf("Successful heartbeat for %s", address) + } else { + logger.Debugf("Failed heartbeat for %s: %v", address, err) + } + } + + for _, node := range nodes { // Special case the local node if node.Address == nodeAddress { heartbeatsLock.Lock() - heartbeats[i] = time.Now() + heartbeats[node.ID] = true heartbeatsLock.Unlock() continue } // Parallelize the rest heartbeatsWg.Add(1) - go func(i int, address string) { - defer heartbeatsWg.Done() + go sendHeartbeat(node.ID, node.Address, true) + } + heartbeatsWg.Wait() - // Spread in time by waiting up to 3s less than the interval - time.Sleep(time.Duration(rand.Intn((heartbeatInterval*1000)-3000)) * time.Millisecond) - logger.Debugf("Sending heartbeat to %s", address) - - err := heartbeatNode(ctx, address, gateway.cert, raftNodes) - if err == nil { - heartbeatsLock.Lock() - heartbeats[i] = time.Now() - heartbeatsLock.Unlock() - logger.Debugf("Successful heartbeat for %s", address) - } else { - logger.Debugf("Failed heartbeat for %s: %v", address, err) + // Look for any new node which appeared since + var currentNodes []db.NodeInfo + err = cluster.Transaction(func(tx *db.ClusterTx) error { + var err error + nodes, err = tx.Nodes() + if err != nil { + return err + } + + return nil + }) + if err != nil { + logger.Warnf("Failed to get current cluster nodes: %v", err) + return + } + + for _, currentNode := range currentNodes { + found := false + for _, node := range nodes { + if node.Address == currentNode.Address { + found = true + break } - }(i, node.Address) + } + + if !found { + // We found a new node + heartbeatsWg.Add(1) + go sendHeartbeat(currentNode.ID, currentNode.Address, false) + } } heartbeatsWg.Wait() @@ -114,8 +153,8 @@ func Heartbeat(gateway *Gateway, cluster *db.Cluster) (task.Func, task.Schedule) } err = cluster.Transaction(func(tx *db.ClusterTx) error { - for i, node := range nodes { - if heartbeats[i].Equal(time.Time{}) { + for _, node := range nodes { + if !heartbeats[node.ID] { continue } diff --git a/lxd/cluster/membership.go b/lxd/cluster/membership.go index 2eca5a6385..de49f7c45b 100644 --- a/lxd/cluster/membership.go +++ b/lxd/cluster/membership.go @@ -1,6 +1,7 @@ package cluster import ( + "context" "fmt" "os" "path/filepath" @@ -412,6 +413,11 @@ func Join(state *state.State, gateway *Gateway, cert *shared.CertInfo, name stri return errors.Wrapf(err, "failed to unmark the node as pending") } + // Attempt to send a heartbeat to all other nodes + for _, node := range nodes { + go heartbeatNode(context.Background(), node.Address, cert, nodes) + } + return nil }) if err != nil { From lxc-bot at linuxcontainers.org Wed May 8 19:10:44 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Wed, 08 May 2019 12:10:44 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/images: Properly handle invalid protocols Message-ID: <5cd329b4.1c69fb81.ea2ff.540eSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 370 bytes Desc: not available URL: -------------- next part -------------- From f6b06203a0eef7853b003cd50b619ef0fa7c61c8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 15:10:09 -0400 Subject: [PATCH] lxd/images: Properly handle invalid protocols MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #5738 Signed-off-by: Stéphane Graber --- lxd/daemon_images.go | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lxd/daemon_images.go b/lxd/daemon_images.go index e0270d323b..cc2db2b25c 100644 --- a/lxd/daemon_images.go +++ b/lxd/daemon_images.go @@ -528,6 +528,8 @@ func (d *Daemon) ImageDownload(op *operation, server string, protocol string, ce info.CreatedAt = time.Unix(imageMeta.CreationDate, 0) info.ExpiresAt = time.Unix(imageMeta.ExpiryDate, 0) info.Properties = imageMeta.Properties + } else { + return nil, fmt.Errorf("Unsupported protocol: %v", protocol) } // Override visiblity From lxc-bot at linuxcontainers.org Wed May 8 19:41:43 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Wed, 08 May 2019 12:41:43 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/seccomp: Really handle old libseccomp Message-ID: <5cd330f7.1c69fb81.e0714.791aSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 370 bytes Desc: not available URL: -------------- next part -------------- From 0bb4473932b9371ad08c6dec8914be2e48e05e30 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 15:40:56 -0400 Subject: [PATCH] lxd/seccomp: Really handle old libseccomp MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #5737 Signed-off-by: Stéphane Graber --- lxd/container_lxc.go | 36 +++++++++++++++++++++++++++++++----- lxd/seccomp.go | 4 +--- 2 files changed, 32 insertions(+), 8 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 508a8db697..46d519a8d3 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -199,6 +199,29 @@ func lxcParseRawLXC(line string) (string, string, error) { return key, val, nil } +func lxcSupportSeccompNotify(state *state.State) bool { + if !state.OS.SeccompListener { + return false + } + + if !lxc.HasApiExtension("seccomp_notify") { + return false + } + + c, err := lxc.NewContainer("test-seccomp", state.OS.LxcPath) + if err != nil { + return false + } + + err = c.SetConfigItem("lxc.seccomp.notify.proxy", fmt.Sprintf("unix:%s", shared.VarPath("seccomp.socket"))) + if err != nil { + return false + } + + c.Release() + return true +} + func lxcValidConfig(rawLxc string) error { for _, line := range strings.Split(rawLxc, "\n") { key, _, err := lxcParseRawLXC(line) @@ -1811,11 +1834,14 @@ func (c *containerLXC) initLXC(config bool) error { return err } - if !c.IsPrivileged() && !c.state.OS.RunningInUserNS && lxc.HasApiExtension("seccomp_notify") && c.DaemonState().OS.SeccompListener { - // NOTE: Don't fail in cases where liblxc is recent enough but libseccomp isn't - // when we add mount() support with user-configurable - // options, we will want a hard fail if the user configured it - lxcSetConfigItem(cc, "lxc.seccomp.notify.proxy", fmt.Sprintf("unix:%s", shared.VarPath("seccomp.socket"))) + // NOTE: Don't fail in cases where liblxc is recent enough but libseccomp isn't + // when we add mount() support with user-configurable + // options, we will want a hard fail if the user configured it + if !c.IsPrivileged() && !c.state.OS.RunningInUserNS && lxcSupportSeccompNotify(c.state) { + err = lxcSetConfigItem(cc, "lxc.seccomp.notify.proxy", fmt.Sprintf("unix:%s", shared.VarPath("seccomp.socket"))) + if err != nil { + return err + } } // Apply raw.lxc diff --git a/lxd/seccomp.go b/lxd/seccomp.go index 90a934a750..5e0afc80c6 100644 --- a/lxd/seccomp.go +++ b/lxd/seccomp.go @@ -15,8 +15,6 @@ import ( "golang.org/x/sys/unix" - "gopkg.in/lxc/go-lxc.v2" - "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/logger" @@ -253,7 +251,7 @@ func getSeccompProfileContent(c container) (string, error) { policy += DEFAULT_SECCOMP_POLICY } - if !c.IsPrivileged() && !c.DaemonState().OS.RunningInUserNS && lxc.HasApiExtension("seccomp_notify") && c.DaemonState().OS.SeccompListener { + if !c.IsPrivileged() && !c.DaemonState().OS.RunningInUserNS && lxcSupportSeccompNotify(c.DaemonState()) { policy += SECCOMP_NOTIFY_POLICY } From lxc-bot at linuxcontainers.org Wed May 8 20:39:42 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Wed, 08 May 2019 13:39:42 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/container: Early check for running container refresh Message-ID: <5cd33e8e.1c69fb81.e56ba.2aabSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 354 bytes Desc: not available URL: -------------- next part -------------- From 1be2f80afeed0ccee761f795d3c70c98d7f828a0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 16:38:22 -0400 Subject: [PATCH] lxd/container: Early check for running container refresh MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/containers_post.go | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/lxd/containers_post.go b/lxd/containers_post.go index 0151b59eee..fdfa2b972a 100644 --- a/lxd/containers_post.go +++ b/lxd/containers_post.go @@ -267,6 +267,7 @@ func createFromMigration(d *Daemon, project string, req *api.ContainersPost) Res args.Devices[localRootDiskDeviceKey]["pool"] = storagePool } + // Early check for refresh if req.Source.Refresh { // Check if the container exists c, err = containerLoadByProjectAndName(d.State(), project, req.Name) @@ -545,6 +546,17 @@ func createFromCopy(d *Daemon, project string, req *api.ContainersPost) Response } } + // Early check for refresh + if req.Source.Refresh { + // Check if the container exists + c, err := containerLoadByProjectAndName(d.State(), targetProject, req.Name) + if err != nil { + req.Source.Refresh = false + } else if c.IsRunning() { + return BadRequest(fmt.Errorf("Cannot refresh a running container")) + } + } + args := db.ContainerArgs{ Project: targetProject, Architecture: source.Architecture(), From lxc-bot at linuxcontainers.org Wed May 8 21:18:55 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Wed, 08 May 2019 14:18:55 -0700 (PDT) Subject: [lxc-devel] [lxd/master] Export LXC features in API Message-ID: <5cd347bf.1c69fb81.ed266.2164SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From f33e1ca2f2a0683cbb05a0139e479c4d7f604165 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 17:03:32 -0400 Subject: [PATCH 1/5] lxd/sys: Cleanup State struct MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/sys/os.go | 50 ++++++++++++++++++++++++++++---------------------- 1 file changed, 28 insertions(+), 22 deletions(-) diff --git a/lxd/sys/os.go b/lxd/sys/os.go index e65f4eb173..a880b71214 100644 --- a/lxd/sys/os.go +++ b/lxd/sys/os.go @@ -31,25 +31,32 @@ type InotifyInfo struct { // OS is a high-level facade for accessing all operating-system // level functionality that LXD uses. type OS struct { - VarDir string // Data directory (e.g. /var/lib/lxd/). + // Directories CacheDir string // Cache directory (e.g. /var/cache/lxd/). LogDir string // Log directory (e.g. /var/log/lxd). + VarDir string // Data directory (e.g. /var/lib/lxd/). - // Caches of system characteristics detected at Init() time. - Architectures []int // Cache of detected system architectures - LxcPath string // Path to the $LXD_DIR/containers directory - BackingFS string // Backing filesystem of $LXD_DIR/containers - IdmapSet *idmap.IdmapSet // Information about user/group ID mapping - ExecPath string // Absolute path to the LXD executable - RunningInUserNS bool - AppArmorAvailable bool - AppArmorStacking bool - AppArmorStacked bool - AppArmorAdmin bool - AppArmorConfined bool + // Daemon environment + Architectures []int // Cache of detected system architectures + BackingFS string // Backing filesystem of $LXD_DIR/containers + ExecPath string // Absolute path to the LXD executable + IdmapSet *idmap.IdmapSet // Information about user/group ID mapping + InotifyWatch InotifyInfo + LxcPath string // Path to the $LXD_DIR/containers directory + MockMode bool // If true some APIs will be mocked (for testing) + RunningInUserNS bool + + // Apparmor features + AppArmorAdmin bool + AppArmorAvailable bool + AppArmorConfined bool + AppArmorStacked bool + AppArmorStacking bool + + // Cgroup features CGroupBlkioController bool - CGroupCPUController bool CGroupCPUacctController bool + CGroupCPUController bool CGroupCPUsetController bool CGroupDevicesController bool CGroupFreezerController bool @@ -57,14 +64,13 @@ type OS struct { CGroupNetPrioController bool CGroupPidsController bool CGroupSwapAccounting bool - InotifyWatch InotifyInfo - NetnsGetifaddrs bool - UeventInjection bool - SeccompListener bool - VFS3Fscaps bool - Shiftfs bool - - MockMode bool // If true some APIs will be mocked (for testing) + + // Kernel features + NetnsGetifaddrs bool + SeccompListener bool + Shiftfs bool + UeventInjection bool + VFS3Fscaps bool } // DefaultOS returns a fresh uninitialized OS instance with default values. From 22650cc76949432a98c822e47ff1450881b965f9 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Wed, 8 May 2019 11:50:07 +0100 Subject: [PATCH 2/5] lxd/api: Add lxc_features to /1.0 Signed-off-by: Thomas Parrott --- lxd/api_1.0.go | 7 +++++++ lxd/daemon.go | 11 +++++++++++ lxd/sys/os.go | 3 +++ 3 files changed, 21 insertions(+) diff --git a/lxd/api_1.0.go b/lxd/api_1.0.go index f06cd18384..c794c2ff6f 100644 --- a/lxd/api_1.0.go +++ b/lxd/api_1.0.go @@ -211,6 +211,13 @@ func api10Get(d *Daemon, r *http.Request) Response { "shiftfs": fmt.Sprintf("%v", d.os.Shiftfs), } + if d.os.LXCFeatures != nil { + env.LXCFeatures = map[string]string{} + for k, v := range d.os.LXCFeatures { + env.LXCFeatures[k] = fmt.Sprintf("%v", v) + } + } + drivers := readStoragePoolDriversCache() for driver, version := range drivers { if env.Storage != "" { diff --git a/lxd/daemon.go b/lxd/daemon.go index 3f9d572a7d..966c562baf 100644 --- a/lxd/daemon.go +++ b/lxd/daemon.go @@ -21,6 +21,7 @@ import ( "github.com/gorilla/mux" "github.com/pkg/errors" "golang.org/x/net/context" + "gopkg.in/lxc/go-lxc.v2" "gopkg.in/macaroon-bakery.v2/bakery" "gopkg.in/macaroon-bakery.v2/bakery/checkers" "gopkg.in/macaroon-bakery.v2/bakery/identchecker" @@ -576,6 +577,16 @@ func (d *Daemon) init() error { logger.Infof(" - shiftfs support: no") } + // Detect LXC features + d.os.LXCFeatures = map[string]bool{} + lxcExtensions := []string{ + "mount_injection_file", + "seccomp_notify", + } + for _, extension := range lxcExtensions { + d.os.LXCFeatures[extension] = lxc.HasApiExtension(extension) + } + /* Initialize the database */ dump, err := initializeDbObject(d) if err != nil { diff --git a/lxd/sys/os.go b/lxd/sys/os.go index a880b71214..8a507f99d7 100644 --- a/lxd/sys/os.go +++ b/lxd/sys/os.go @@ -71,6 +71,9 @@ type OS struct { Shiftfs bool UeventInjection bool VFS3Fscaps bool + + // LXC features + LXCFeatures map[string]bool } // DefaultOS returns a fresh uninitialized OS instance with default values. From 55292f1d3418f05ff4d9c0cd9c7c3f9354cf1c86 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 17:11:40 -0400 Subject: [PATCH 3/5] api: Add lxc_features extension MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- doc/api-extensions.md | 5 +++++ shared/version/api.go | 1 + 2 files changed, 6 insertions(+) diff --git a/doc/api-extensions.md b/doc/api-extensions.md index d1d803ca77..4320b8509f 100644 --- a/doc/api-extensions.md +++ b/doc/api-extensions.md @@ -757,3 +757,8 @@ migration is required. If the kernel supports seccomp-based syscall interception LXD can be notified by a container that a registered syscall has been performed. LXD can then decide to trigger various actions. + +## lxc\_features +This introduces the `lxc_features` section output from the `lxc info` command +via the `GET /1.0/` route. It outputs the result of checks for key features being present in the +underlying LXC library. diff --git a/shared/version/api.go b/shared/version/api.go index 8a0ee9b1fb..2064b6f86e 100644 --- a/shared/version/api.go +++ b/shared/version/api.go @@ -152,6 +152,7 @@ var APIExtensions = []string{ "rbac", "cluster_internal_copy", "seccomp_notify", + "lxc_features", } // APIExtensionsCount returns the number of available API extensions. From 772ef987fe487d23b84fd6f2453193897f722b79 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 17:12:46 -0400 Subject: [PATCH 4/5] shared/api: Add lxc_features MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- shared/api/server.go | 3 +++ 1 file changed, 3 insertions(+) diff --git a/shared/api/server.go b/shared/api/server.go index b2d961179f..c40b44e410 100644 --- a/shared/api/server.go +++ b/shared/api/server.go @@ -16,6 +16,9 @@ type ServerEnvironment struct { KernelVersion string `json:"kernel_version" yaml:"kernel_version"` + // API extension: lxc_features + LXCFeatures map[string]string `json:"lxc_features" yaml:"lxc_features"` + // API extension: projects Project string `json:"project" yaml:"project"` From 080b481baf9482f251eaffa6f122c32024e0c48e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 17:18:02 -0400 Subject: [PATCH 5/5] lxd: Port from HasApiExtension to LXCFeatures MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/container_lxc.go | 8 ++++---- lxd/seccomp.go | 5 ++--- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 508a8db697..995b48961d 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -1802,7 +1802,7 @@ func (c *containerLXC) initLXC(config bool) error { } // Setup shmounts - if lxc.HasApiExtension("mount_injection_file") { + if c.state.OS.LXCFeatures["mount_injection_file"] { err = lxcSetConfigItem(cc, "lxc.mount.auto", fmt.Sprintf("shmounts:%s:/dev/.lxd-mounts", c.ShmountsPath())) } else { err = lxcSetConfigItem(cc, "lxc.mount.entry", fmt.Sprintf("%s dev/.lxd-mounts none bind,create=dir 0 0", c.ShmountsPath())) @@ -1811,7 +1811,7 @@ func (c *containerLXC) initLXC(config bool) error { return err } - if !c.IsPrivileged() && !c.state.OS.RunningInUserNS && lxc.HasApiExtension("seccomp_notify") && c.DaemonState().OS.SeccompListener { + if !c.IsPrivileged() && !c.state.OS.RunningInUserNS && c.state.OS.LXCFeatures["seccomp_notify"] && c.DaemonState().OS.SeccompListener { // NOTE: Don't fail in cases where liblxc is recent enough but libseccomp isn't // when we add mount() support with user-configurable // options, we will want a hard fail if the user configured it @@ -6604,7 +6604,7 @@ func (c *containerLXC) insertMount(source, target, fstype string, flags int) err return fmt.Errorf("Can't insert mount into stopped container") } - if lxc.HasApiExtension("mount_injection_file") { + if c.state.OS.LXCFeatures["mount_injection_file"] { cname := projectPrefix(c.Project(), c.Name()) configPath := filepath.Join(c.LogPath(), "lxc.conf") if fstype == "" { @@ -6672,7 +6672,7 @@ func (c *containerLXC) removeMount(mount string) error { return fmt.Errorf("Can't remove mount from stopped container") } - if lxc.HasApiExtension("mount_injection_file") { + if c.state.OS.LXCFeatures["mount_injection_file"] { configPath := filepath.Join(c.LogPath(), "lxc.conf") cname := projectPrefix(c.Project(), c.Name()) diff --git a/lxd/seccomp.go b/lxd/seccomp.go index 90a934a750..a20afe4dfb 100644 --- a/lxd/seccomp.go +++ b/lxd/seccomp.go @@ -15,8 +15,6 @@ import ( "golang.org/x/sys/unix" - "gopkg.in/lxc/go-lxc.v2" - "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/logger" @@ -253,7 +251,8 @@ func getSeccompProfileContent(c container) (string, error) { policy += DEFAULT_SECCOMP_POLICY } - if !c.IsPrivileged() && !c.DaemonState().OS.RunningInUserNS && lxc.HasApiExtension("seccomp_notify") && c.DaemonState().OS.SeccompListener { + os := c.DaemonState().OS + if !c.IsPrivileged() && !os.RunningInUserNS && os.LXCFeatures["seccomp_notify"] && os.SeccompListener { policy += SECCOMP_NOTIFY_POLICY } From lxc-bot at linuxcontainers.org Wed May 8 22:35:45 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Wed, 08 May 2019 15:35:45 -0700 (PDT) Subject: [lxc-devel] [lxd/master] Move some cgo to separate packages Message-ID: <5cd359c1.1c69fb81.32fd0.82fbSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From 2768b6646834078e4a702e7012d028965cb318dc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 18:10:05 -0400 Subject: [PATCH 1/2] shared: Move network cgo to shared/netutils MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/container_exec.go | 3 ++- lxd/container_lxc.go | 3 ++- lxd/main_checkfeature.go | 2 +- lxd/main_forknet.go | 4 ++-- lxd/main_forkuevent.go | 2 +- shared/{ => netutils}/netns_getifaddrs.c | 0 shared/{ => netutils}/network.c | 2 +- shared/{ => netutils}/network_linux.go | 9 +++++---- shared/network.go | 6 +++--- 9 files changed, 17 insertions(+), 14 deletions(-) rename shared/{ => netutils}/netns_getifaddrs.c (100%) rename shared/{ => netutils}/network.c (99%) rename shared/{ => netutils}/network_linux.go (96%) diff --git a/lxd/container_exec.go b/lxd/container_exec.go index 238b1df3a0..76159915d5 100644 --- a/lxd/container_exec.go +++ b/lxd/container_exec.go @@ -21,6 +21,7 @@ import ( "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" "github.com/lxc/lxd/shared/logger" + "github.com/lxc/lxd/shared/netutils" "github.com/lxc/lxd/shared/version" log "github.com/lxc/lxd/shared/log15" @@ -238,7 +239,7 @@ func (s *execWs) Do(op *operation) error { s.connsLock.Unlock() logger.Debugf("Starting to mirror websocket") - readDone, writeDone := shared.WebsocketExecMirror(conn, ptys[0], ptys[0], attachedChildIsDead, int(ptys[0].Fd())) + readDone, writeDone := netutils.WebsocketExecMirror(conn, ptys[0], ptys[0], attachedChildIsDead, int(ptys[0].Fd())) <-readDone <-writeDone diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 508a8db697..6c0af09783 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -38,6 +38,7 @@ import ( "github.com/lxc/lxd/shared/api" "github.com/lxc/lxd/shared/idmap" "github.com/lxc/lxd/shared/logger" + "github.com/lxc/lxd/shared/netutils" "github.com/lxc/lxd/shared/osarch" log "github.com/lxc/lxd/shared/log15" @@ -6354,7 +6355,7 @@ func (c *containerLXC) networkState() map[string]api.ContainerStateNetwork { couldUseNetnsGetifaddrs := c.state.OS.NetnsGetifaddrs if couldUseNetnsGetifaddrs { - nw, err := shared.NetnsGetifaddrs(int32(pid)) + nw, err := netutils.NetnsGetifaddrs(int32(pid)) if err != nil { couldUseNetnsGetifaddrs = false logger.Error("Failed to retrieve network information via netlink", log.Ctx{"container": c.name, "pid": pid}) diff --git a/lxd/main_checkfeature.go b/lxd/main_checkfeature.go index 3b12811b0c..87fced5fd7 100644 --- a/lxd/main_checkfeature.go +++ b/lxd/main_checkfeature.go @@ -24,7 +24,7 @@ import ( #include #include -#include "../shared/netns_getifaddrs.c" +#include "../shared/netutils/netns_getifaddrs.c" #include "include/memory_utils.h" bool netnsid_aware = false; diff --git a/lxd/main_forknet.go b/lxd/main_forknet.go index e7ef06e664..1c04dea924 100644 --- a/lxd/main_forknet.go +++ b/lxd/main_forknet.go @@ -6,7 +6,7 @@ import ( "github.com/spf13/cobra" - "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/netutils" ) /* @@ -93,7 +93,7 @@ func (c *cmdForknet) Command() *cobra.Command { } func (c *cmdForknet) RunInfo(cmd *cobra.Command, args []string) error { - networks, err := shared.NetnsGetifaddrs(-1) + networks, err := netutils.NetnsGetifaddrs(-1) if err != nil { return err } diff --git a/lxd/main_forkuevent.go b/lxd/main_forkuevent.go index ed2ab419d3..8c769bb259 100644 --- a/lxd/main_forkuevent.go +++ b/lxd/main_forkuevent.go @@ -25,7 +25,7 @@ import ( #include #include -#include "../shared/network.c" +#include "../shared/netutils/network.c" #include "include/memory_utils.h" #ifndef UEVENT_SEND diff --git a/shared/netns_getifaddrs.c b/shared/netutils/netns_getifaddrs.c similarity index 100% rename from shared/netns_getifaddrs.c rename to shared/netutils/netns_getifaddrs.c diff --git a/shared/network.c b/shared/netutils/network.c similarity index 99% rename from shared/network.c rename to shared/netutils/network.c index d0f40b0393..882a1972c4 100644 --- a/shared/network.c +++ b/shared/netutils/network.c @@ -19,7 +19,7 @@ #include #include -#include "../lxd/include/macro.h" +#include "../../lxd/include/macro.h" #ifndef NETNS_RTA #define NETNS_RTA(r) \ diff --git a/shared/network_linux.go b/shared/netutils/network_linux.go similarity index 96% rename from shared/network_linux.go rename to shared/netutils/network_linux.go index 3e920f3ccb..afe3aa31c6 100644 --- a/shared/network_linux.go +++ b/shared/netutils/network_linux.go @@ -1,7 +1,7 @@ // +build linux // +build cgo -package shared +package netutils import ( "fmt" @@ -12,12 +12,13 @@ import ( "github.com/gorilla/websocket" + "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/api" "github.com/lxc/lxd/shared/logger" ) /* -#include "../shared/netns_getifaddrs.c" +#include "../../shared/netutils/netns_getifaddrs.c" */ // #cgo CFLAGS: -std=gnu11 -Wvla import "C" @@ -173,10 +174,10 @@ func WebsocketExecMirror(conn *websocket.Conn, w io.WriteCloser, r io.ReadCloser readDone := make(chan bool, 1) writeDone := make(chan bool, 1) - go defaultWriter(conn, w, writeDone) + go shared.DefaultWriter(conn, w, writeDone) go func(conn *websocket.Conn, r io.ReadCloser) { - in := ExecReaderToChannel(r, -1, exited, fd) + in := shared.ExecReaderToChannel(r, -1, exited, fd) for { buf, ok := <-in if !ok { diff --git a/shared/network.go b/shared/network.go index 9a37468c0c..00e200001e 100644 --- a/shared/network.go +++ b/shared/network.go @@ -326,7 +326,7 @@ func defaultReader(conn *websocket.Conn, r io.ReadCloser, readDone chan<- bool) r.Close() } -func defaultWriter(conn *websocket.Conn, w io.WriteCloser, writeDone chan<- bool) { +func DefaultWriter(conn *websocket.Conn, w io.WriteCloser, writeDone chan<- bool) { for { mt, r, err := conn.NextReader() if err != nil { @@ -382,7 +382,7 @@ func WebsocketMirror(conn *websocket.Conn, w io.WriteCloser, r io.ReadCloser, Re WriteFunc := Writer if WriteFunc == nil { - WriteFunc = defaultWriter + WriteFunc = DefaultWriter } go ReadFunc(conn, r, readDone) @@ -395,7 +395,7 @@ func WebsocketConsoleMirror(conn *websocket.Conn, w io.WriteCloser, r io.ReadClo readDone := make(chan bool, 1) writeDone := make(chan bool, 1) - go defaultWriter(conn, w, writeDone) + go DefaultWriter(conn, w, writeDone) go func(conn *websocket.Conn, r io.ReadCloser) { in := ReaderToChannel(r, -1) From 36c779639ae4536aaa1d0e964112188098f65d9f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 8 May 2019 18:33:39 -0400 Subject: [PATCH 2/2] shared/netutils: Move send/recv fd functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #5725 Signed-off-by: Stéphane Graber --- lxd/main_forkproxy.go | 5 +- lxd/seccomp.go | 3 +- shared/netutils/network_linux.go | 40 +++++++++ shared/netutils/unixfd.c | 114 +++++++++++++++++++++++++ shared/util_linux_cgo.go | 142 ------------------------------- 5 files changed, 159 insertions(+), 145 deletions(-) create mode 100644 shared/netutils/unixfd.c diff --git a/lxd/main_forkproxy.go b/lxd/main_forkproxy.go index dc1563d6a7..696953068f 100644 --- a/lxd/main_forkproxy.go +++ b/lxd/main_forkproxy.go @@ -16,6 +16,7 @@ import ( "github.com/spf13/cobra" "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/netutils" ) /* @@ -508,7 +509,7 @@ func (c *cmdForkproxy) Run(cmd *cobra.Command, args []string) error { } sAgain: - err = shared.AbstractUnixSendFd(forkproxyUDSSockFDNum, int(file.Fd())) + err = netutils.AbstractUnixSendFd(forkproxyUDSSockFDNum, int(file.Fd())) if err != nil { errno, ok := shared.GetErrno(err) if ok && (errno == syscall.EAGAIN) { @@ -566,7 +567,7 @@ func (c *cmdForkproxy) Run(cmd *cobra.Command, args []string) error { files := []*os.File{} for range lAddr.addr { rAgain: - f, err := shared.AbstractUnixReceiveFd(forkproxyUDSSockFDNum) + f, err := netutils.AbstractUnixReceiveFd(forkproxyUDSSockFDNum) if err != nil { errno, ok := shared.GetErrno(err) if ok && (errno == syscall.EAGAIN) { diff --git a/lxd/seccomp.go b/lxd/seccomp.go index 90a934a750..13ebcb3458 100644 --- a/lxd/seccomp.go +++ b/lxd/seccomp.go @@ -20,6 +20,7 @@ import ( "github.com/lxc/lxd/lxd/util" "github.com/lxc/lxd/shared" "github.com/lxc/lxd/shared/logger" + "github.com/lxc/lxd/shared/netutils" "github.com/lxc/lxd/shared/osarch" ) @@ -361,7 +362,7 @@ func NewSeccompServer(d *Daemon, path string) (*SeccompServer, error) { for { buf := make([]byte, C.SECCOMP_PROXY_MSG_SIZE) - fdMem, err := shared.AbstractUnixReceiveFdData(int(unixFile.Fd()), buf) + fdMem, err := netutils.AbstractUnixReceiveFdData(int(unixFile.Fd()), buf) if err != nil || err == io.EOF { logger.Debugf("Disconnected from seccomp socket after receive: pid=%v", ucred.pid) c.Close() diff --git a/shared/netutils/network_linux.go b/shared/netutils/network_linux.go index afe3aa31c6..957080a239 100644 --- a/shared/netutils/network_linux.go +++ b/shared/netutils/network_linux.go @@ -9,6 +9,7 @@ import ( "net" "os" "strings" + "unsafe" "github.com/gorilla/websocket" @@ -19,6 +20,7 @@ import ( /* #include "../../shared/netutils/netns_getifaddrs.c" +#include "../../shared/netutils/unixfd.c" */ // #cgo CFLAGS: -std=gnu11 -Wvla import "C" @@ -208,3 +210,41 @@ func WebsocketExecMirror(conn *websocket.Conn, w io.WriteCloser, r io.ReadCloser return readDone, writeDone } + +func AbstractUnixSendFd(sockFD int, sendFD int) error { + fd := C.int(sendFD) + sk_fd := C.int(sockFD) + ret := C.lxc_abstract_unix_send_fds(sk_fd, &fd, C.int(1), nil, C.size_t(0)) + if ret < 0 { + return fmt.Errorf("Failed to send file descriptor via abstract unix socket") + } + + return nil +} + +func AbstractUnixReceiveFd(sockFD int) (*os.File, error) { + fd := C.int(-1) + sk_fd := C.int(sockFD) + ret := C.lxc_abstract_unix_recv_fds(sk_fd, &fd, C.int(1), nil, C.size_t(0)) + if ret < 0 { + return nil, fmt.Errorf("Failed to receive file descriptor via abstract unix socket") + } + + file := os.NewFile(uintptr(fd), "") + return file, nil +} + +func AbstractUnixReceiveFdData(sockFD int, buf []byte) (int, error) { + fd := C.int(-1) + sk_fd := C.int(sockFD) + ret := C.lxc_abstract_unix_recv_fds(sk_fd, &fd, C.int(1), unsafe.Pointer(&buf[0]), C.size_t(len(buf))) + if ret < 0 { + return int(-C.EBADF), fmt.Errorf("Failed to receive file descriptor via abstract unix socket") + } + + if ret == 0 { + return int(-C.EBADF), io.EOF + } + + return int(fd), nil +} diff --git a/shared/netutils/unixfd.c b/shared/netutils/unixfd.c new file mode 100644 index 0000000000..6aac9da772 --- /dev/null +++ b/shared/netutils/unixfd.c @@ -0,0 +1,114 @@ +// +build none + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "../../lxd/include/memory_utils.h" + +int lxc_abstract_unix_send_fds(int fd, int *sendfds, int num_sendfds, + void *data, size_t size) +{ + __do_free char *cmsgbuf = NULL; + struct msghdr msg; + struct iovec iov; + struct cmsghdr *cmsg = NULL; + char buf[1] = {0}; + size_t cmsgbufsize = CMSG_SPACE(num_sendfds * sizeof(int)); + + memset(&msg, 0, sizeof(msg)); + memset(&iov, 0, sizeof(iov)); + + cmsgbuf = malloc(cmsgbufsize); + if (!cmsgbuf) + return -1; + + msg.msg_control = cmsgbuf; + msg.msg_controllen = cmsgbufsize; + + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + cmsg->cmsg_len = CMSG_LEN(num_sendfds * sizeof(int)); + + msg.msg_controllen = cmsg->cmsg_len; + + memcpy(CMSG_DATA(cmsg), sendfds, num_sendfds * sizeof(int)); + + iov.iov_base = data ? data : buf; + iov.iov_len = data ? size : sizeof(buf); + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + + return sendmsg(fd, &msg, MSG_NOSIGNAL); +} + +int lxc_abstract_unix_recv_fds(int fd, int *recvfds, int num_recvfds, + void *data, size_t size) +{ + __do_free char *cmsgbuf = NULL; + int ret; + struct msghdr msg; + struct iovec iov; + struct cmsghdr *cmsg = NULL; + char buf[1] = {0}; + size_t cmsgbufsize = CMSG_SPACE(sizeof(struct ucred)) + + CMSG_SPACE(num_recvfds * sizeof(int)); + + memset(&msg, 0, sizeof(msg)); + memset(&iov, 0, sizeof(iov)); + + cmsgbuf = malloc(cmsgbufsize); + if (!cmsgbuf) { + errno = ENOMEM; + return -1; + } + + msg.msg_control = cmsgbuf; + msg.msg_controllen = cmsgbufsize; + + iov.iov_base = data ? data : buf; + iov.iov_len = data ? size : sizeof(buf); + msg.msg_iov = &iov; + msg.msg_iovlen = 1; + +again: + ret = recvmsg(fd, &msg, 0); + if (ret < 0) { + if (errno == EINTR) + goto again; + + goto out; + } + if (ret == 0) + goto out; + + // If SO_PASSCRED is set we will always get a ucred message. + for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { + if (cmsg->cmsg_type != SCM_RIGHTS) + continue; + + memset(recvfds, -1, num_recvfds * sizeof(int)); + if (cmsg && + cmsg->cmsg_len == CMSG_LEN(num_recvfds * sizeof(int)) && + cmsg->cmsg_level == SOL_SOCKET) + memcpy(recvfds, CMSG_DATA(cmsg), num_recvfds * sizeof(int)); + break; + } + +out: + return ret; +} diff --git a/shared/util_linux_cgo.go b/shared/util_linux_cgo.go index acd8f9b218..d648a58d66 100644 --- a/shared/util_linux_cgo.go +++ b/shared/util_linux_cgo.go @@ -35,16 +35,6 @@ import ( #include #include -#include "../lxd/include/memory_utils.h" - -#ifndef AT_SYMLINK_FOLLOW -#define AT_SYMLINK_FOLLOW 0x400 -#endif - -#ifndef AT_EMPTY_PATH -#define AT_EMPTY_PATH 0x1000 -#endif - #define ABSTRACT_UNIX_SOCK_LEN sizeof(((struct sockaddr_un *)0)->sun_path) // This is an adaption from https://codereview.appspot.com/4589049, to be @@ -144,100 +134,6 @@ again: return ret; } - -int lxc_abstract_unix_send_fds(int fd, int *sendfds, int num_sendfds, - void *data, size_t size) -{ - __do_free char *cmsgbuf = NULL; - struct msghdr msg; - struct iovec iov; - struct cmsghdr *cmsg = NULL; - char buf[1] = {0}; - size_t cmsgbufsize = CMSG_SPACE(num_sendfds * sizeof(int)); - - memset(&msg, 0, sizeof(msg)); - memset(&iov, 0, sizeof(iov)); - - cmsgbuf = malloc(cmsgbufsize); - if (!cmsgbuf) - return -1; - - msg.msg_control = cmsgbuf; - msg.msg_controllen = cmsgbufsize; - - cmsg = CMSG_FIRSTHDR(&msg); - cmsg->cmsg_level = SOL_SOCKET; - cmsg->cmsg_type = SCM_RIGHTS; - cmsg->cmsg_len = CMSG_LEN(num_sendfds * sizeof(int)); - - msg.msg_controllen = cmsg->cmsg_len; - - memcpy(CMSG_DATA(cmsg), sendfds, num_sendfds * sizeof(int)); - - iov.iov_base = data ? data : buf; - iov.iov_len = data ? size : sizeof(buf); - msg.msg_iov = &iov; - msg.msg_iovlen = 1; - - return sendmsg(fd, &msg, MSG_NOSIGNAL); -} - -int lxc_abstract_unix_recv_fds(int fd, int *recvfds, int num_recvfds, - void *data, size_t size) -{ - __do_free char *cmsgbuf = NULL; - int ret; - struct msghdr msg; - struct iovec iov; - struct cmsghdr *cmsg = NULL; - char buf[1] = {0}; - size_t cmsgbufsize = CMSG_SPACE(sizeof(struct ucred)) + - CMSG_SPACE(num_recvfds * sizeof(int)); - - memset(&msg, 0, sizeof(msg)); - memset(&iov, 0, sizeof(iov)); - - cmsgbuf = malloc(cmsgbufsize); - if (!cmsgbuf) { - errno = ENOMEM; - return -1; - } - - msg.msg_control = cmsgbuf; - msg.msg_controllen = cmsgbufsize; - - iov.iov_base = data ? data : buf; - iov.iov_len = data ? size : sizeof(buf); - msg.msg_iov = &iov; - msg.msg_iovlen = 1; - -again: - ret = recvmsg(fd, &msg, 0); - if (ret < 0) { - if (errno == EINTR) - goto again; - - goto out; - } - if (ret == 0) - goto out; - - // If SO_PASSCRED is set we will always get a ucred message. - for (cmsg = CMSG_FIRSTHDR(&msg); cmsg; cmsg = CMSG_NXTHDR(&msg, cmsg)) { - if (cmsg->cmsg_type != SCM_RIGHTS) - continue; - - memset(recvfds, -1, num_recvfds * sizeof(int)); - if (cmsg && - cmsg->cmsg_len == CMSG_LEN(num_recvfds * sizeof(int)) && - cmsg->cmsg_level == SOL_SOCKET) - memcpy(recvfds, CMSG_DATA(cmsg), num_recvfds * sizeof(int)); - break; - } - -out: - return ret; -} */ // #cgo CFLAGS: -std=gnu11 -Wvla import "C" @@ -264,44 +160,6 @@ func GetPollRevents(fd int, timeout int, flags int) (int, int, error) { return int(ret), int(revents), err } -func AbstractUnixSendFd(sockFD int, sendFD int) error { - fd := C.int(sendFD) - sk_fd := C.int(sockFD) - ret := C.lxc_abstract_unix_send_fds(sk_fd, &fd, C.int(1), nil, C.size_t(0)) - if ret < 0 { - return fmt.Errorf("Failed to send file descriptor via abstract unix socket") - } - - return nil -} - -func AbstractUnixReceiveFd(sockFD int) (*os.File, error) { - fd := C.int(-1) - sk_fd := C.int(sockFD) - ret := C.lxc_abstract_unix_recv_fds(sk_fd, &fd, C.int(1), nil, C.size_t(0)) - if ret < 0 { - return nil, fmt.Errorf("Failed to receive file descriptor via abstract unix socket") - } - - file := os.NewFile(uintptr(fd), "") - return file, nil -} - -func AbstractUnixReceiveFdData(sockFD int, buf []byte) (int, error) { - fd := C.int(-1) - sk_fd := C.int(sockFD) - ret := C.lxc_abstract_unix_recv_fds(sk_fd, &fd, C.int(1), unsafe.Pointer(&buf[0]), C.size_t(len(buf))) - if ret < 0 { - return int(-C.EBADF), fmt.Errorf("Failed to receive file descriptor via abstract unix socket") - } - - if ret == 0 { - return int(-C.EBADF), io.EOF - } - - return int(fd), nil -} - func OpenPty(uid, gid int64) (master *os.File, slave *os.File, err error) { fd_master := C.int(-1) fd_slave := C.int(-1) From lxc-bot at linuxcontainers.org Thu May 9 06:48:50 2019 From: lxc-bot at linuxcontainers.org (tenforward on Github) Date: Wed, 08 May 2019 23:48:50 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Update Japanese lxc.container.conf(5) Message-ID: <5cd3cd52.1c69fb81.b9a47.1159SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From 7dd6ead90417723a3c9717b45728d701a36e9bff Mon Sep 17 00:00:00 2001 From: KATOH Yasufumi Date: Wed, 8 May 2019 21:42:16 +0900 Subject: [PATCH 1/2] doc: Update Japanese lxc.container.conf(5) This is the translation for the following description: - lxc.seccomp.notify.proxy (commit 8a64375) - host side veth device static routes (commit d4a7da4) - IPVLAN (commit c9f5238) - Layer 2 proxy mode (commit 6509154) - gateway device route mode (commit a2f9a67) and fix typo in English man page. Signed-off-by: KATOH Yasufumi --- doc/ja/lxc.container.conf.sgml.in | 175 +++++++++++++++++++++++++++--- doc/lxc.container.conf.sgml.in | 2 +- 2 files changed, 159 insertions(+), 18 deletions(-) diff --git a/doc/ja/lxc.container.conf.sgml.in b/doc/ja/lxc.container.conf.sgml.in index 7db396f450..553a88ea36 100644 --- a/doc/ja/lxc.container.conf.sgml.in +++ b/doc/ja/lxc.container.conf.sgml.in @@ -604,6 +604,12 @@ by KATOH Yasufumi the option (except for unprivileged containers where this option is ignored for security reasons). + + Static routes can be added on the host pointing to the container using the + and + options. + Several lines specify several routes. + The route is in format x.y.z.t/m, eg. 192.168.1.0/24. --> 一方がコンテナに、もう一方が オプションで指定されたブリッジに接続されるペアの仮想イーサネットデバイスを作成します。 もし、ブリッジが指定されていない場合、veth ペアデバイスは作成されますが、ブリッジには接続されません。 @@ -611,6 +617,10 @@ by KATOH Yasufumi lxc はコンテナ外の設定を扱うことはありません。 デフォルトでは、lxc がコンテナの外部に属するネットワークデバイスに対する名前を決定します。 しかし、もしこの名前を自分で指定したい場合、 オプションを使って名前を設定し、lxc に対して指定をすることができます (非特権コンテナの場合をのぞきます。セキュリティ上の理由からこのオプションは無視されます)。 + + オプションを使って、静的ルーティングをコンテナを指し示すホスト上に追加できます。 + 複数のルートがある場合は複数の設定を指定します。 + ルートは x.y.z.t/m の形式です。例: 192.168.1.0/24 @@ -661,7 +671,7 @@ by KATOH Yasufumi mode is possible for one physical interface. --> macvlan インターフェースは により指定されるインターフェースとリンクし、コンテナに割り当てられます。 - でモードを指定すると、その macvlan の指定を、同じ上位デバイスで異なる macvlan の間の通信をする時に使います。 + でモードを指定すると、その macvlan の指定を、同じ上位デバイスで異なる macvlan 間の通信をする時に使います。 指定できるモードは のいずれかです。 モードの場合、デバイスは同じ上位デバイスの他のデバイスとの通信を行いません (デフォルト)。 新しい仮想イーサネットポート集約モード (Virtual Ethernet Port Aggregator (VEPA)) である モードの場合、隣接したポートが、ソースとデスティネーションの両方が macvlan ポートに対してローカルであるフレームを全て返すと仮定します。 @@ -676,6 +686,54 @@ by KATOH Yasufumi モードの場合、物理インターフェースで受け取った全てのフレームは macvlan インターフェースに転送されます。 モードの場合、ひとつの macvlan インターフェースだけが、ひとつの物理インターフェースに対して設定できます。 + + + ipvlan インターフェースは により指定されるインターフェースとリンクし、コンテナに割り当てられます。 + でモードを指定すると、その ipvlan の指定を、同じ上位デバイスで異なる ipvlan 間の通信をする時に使います。 + 指定できるモードは で、デフォルトは モードです。 + モードでは、L3 までの TX (送信) 処理はスレーブデバイスにアタッチされたスタックインスタンス上で行われます。そしてパケットは、L2 処理のためにマスターデバイスのスタックインスタンスにスイッチされます。このインスタンスからのルーティングは、発信デバイス上でキューに入る前に使われます。このモードでは、スレーブはマルチキャスト・ブロードキャストのトラフィックを受信しませんし、受け取ることもできません。 + モードは、TX (送信) 処理は L3 モードと非常に似ていますが、iptables (conn-tracking) がこのモードで動作します。それゆえに L3対称 (symmetric) (L3s) です。このモードは若干パフォーマンスが低下しますが、conn-tracking (接続追跡) が動作するように、普通の L3 モードの代わりにこのモードを選んでいるので問題にはならないはずです。 + モードでは、TX (送信) 処理はスレーブデバイスにアタッチされたスタックインスンタンス上で行われます。パケットを送信するのに、マスターデバイスにスイッチされ、マスターデバイス上でキューに入ります。このモードでは、スレーブはマルチキャストも(該当する場合)ブロードキャストも RX/TX (送受信) 処理します。 + + は隔離モードを指定します。隔離モードには が指定できます。デフォルトは モードです。 + 隔離モードでは、スレーブはマスターデバイス経由の通信とは別に、スレーブ同士で通信できます。 + 隔離モードでは、ポートはプライベートに設定されます。つまり、スレーブ間の通信はできません。 + 隔離モードでは、ポートは VEPA モードに設定されます。つまり、802.1Qbg で説明されているように、ポートはスイッチング機能を外部エンティティにオフロードします。 + + + レイヤ 2 IP 近隣プロキシエントリを、コンテナの IP アドレスに対応する lxc.net.[i].link インターフェースに追加するかどうかを制御します。0 か 1 が設定でき、デフォルトは 0 です。 + IPv4 アドレスで使う場合は、次の sysctl 設定が必要です: + net.ipv4.conf.[link].forwarding=1 + IPv6 アドレスで使う場合は、次の sysctl 設定が必要です: + net.ipv6.conf.[link].proxy_ndp=1 + net.ipv6.conf.[link].forwarding=1 + + + + @@ -802,11 +886,15 @@ by KATOH Yasufumi interface (as specified by the option) and use that as the gateway. is only available when - using the and - network types. + using the , + and network types. + Can also have the special value of , + which means to set the default gateway as a device route. + This is primarily for use with layer 3 network modes, such as IPVLAN. --> コンテナでゲートウェイとして使う IPv4 アドレスを指定します。アドレスは x.y.z.t というフォーマットです。例) 192.168.1.123 - という特別な値を指定できます。これは ( で指定した) ブリッジインターフェースの最初のアドレスを使用し、それをゲートウェイに使うという意味になります。 はネットワークタイプとして を指定している時だけ有効となります。 + という特別な値を指定できます。これは ( で指定した) ブリッジインターフェースの最初のアドレスを使用し、それをゲートウェイに使うという意味になります。 はネットワークタイプとして を指定している時だけ有効となります。 + 特別な値である も設定できます。これはデバイスのルートとしてデフォルトゲートウェイを設定するという意味です。これは主に、IPVLAN のようなレイヤ 3 のネットワークモードで使用します。 @@ -844,11 +932,15 @@ by KATOH Yasufumi interface (as specified by the option) and use that as the gateway. is only available when - using the and - network types. + using the , + and network types. + Can also have the special value of , + which means to set the default gateway as a device route. + This is primarily for use with layer 3 network modes, such as IPVLAN. --> コンテナでゲートウェイとして使う IPv6 アドレスを指定します。アドレスは x::y というフォーマットです。例) 2003:db8:1:0::1 - という特別な値を記述する事も可能です。これは ( で指定した) ブリッジインターフェースの最初のアドレスを使用し、それをゲートウェイに使うという意味になります。 はネットワークタイプとして を指定している時だけ有効となります。 + という特別な値を記述する事も可能です。これは ( で指定した) ブリッジインターフェースの最初のアドレスを使用し、それをゲートウェイに使うという意味になります。 はネットワークタイプとして を指定している時だけ有効となります。 + 特別な値である も設定できます。これはデバイスのルートとしてデフォルトゲートウェイを設定するという意味です。これは主に、IPVLAN のようなレイヤ 3 のネットワークモードで使用します。 @@ -888,8 +980,8 @@ by KATOH Yasufumi - LXC_NET_TYPE: ネットワークタイプ。有効なネットワークタイプのうちのひとつです (例: 'macvlan', 'veth') + LXC_NET_TYPE: ネットワークタイプ。有効なネットワークタイプのうちのひとつです (例: 'vlan', 'macvlan', 'ipvlan', 'veth') @@ -966,9 +1058,9 @@ by KATOH Yasufumi - - LXC_NET_TYPE: ネットワークタイプ。有効なネットワークタイプのうちのひとつです (例: 'macvlan', 'veth') + + LXC_NET_TYPE: ネットワークタイプ。有効なネットワークタイプのうちのひとつです (例: 'vlan', 'macvlan', 'ipvlan', 'veth') @@ -2500,8 +2592,38 @@ by KATOH Yasufumi 2 blacklist mknod errno 0 + ioctl notify + + + アクションとして "errno" を指定すると、LXC は seccomp フィルタを登録します。これにより、指定した errno を呼び出し元に返します。 + errno の値は "errno" という単語の後に指定します。 + + + + + アクションとして "notify" を指定すると、LXC は seccomp リスナーを登録し、カーネルからリスナーのファイルディスクリプタを取得します。 + "notify" として指定しているシステムコールが作成されると、カーネルは poll イベントを生成し、ファイルディスクリプタを通してメッセージを送信します。 + 呼び出し元はこのメッセージを読み、引数を含めてシステムコールを調査できます。 + 呼び出し元はこの情報に基づき、どのアクションを取るべきかをカーネルに知らせるメッセージを送り返すことが期待されます。 + このメッセージが送られるまで、カーネルは呼び出し元のプロセスをブロックします。読み書きするメッセージのフォーマットは seccomp 自身に記述されています。 + + @@ -2523,15 +2645,34 @@ by KATOH Yasufumi - - このオプションを 1 に設定すると、すでに seccomp プロファイルがロードされている、いないに関わらず、seccomp フィルタが重ね合わせられます。 - これにより、ネストされたコンテナが自身の seccomp プロファイルをロードできます。 - デフォルト値は 0 です。 + --> + このオプションを 1 に設定すると、すでに seccomp プロファイルがロードされている、いないに関わらず、seccomp フィルタが重ね合わせられます。 + これにより、ネストされたコンテナが自身の seccomp プロファイルをロードできます。 + デフォルト値は 0 です。 + + + + + + + + + + + LXC が接続し、seccomp イベントを転送する UNIX ソケットを指定します。 + パスは unix:/path/to/socket もしくは unix:@socket の形式でなければなりません。 + 前者はパス指定の UNIX ドメインソケットを指定し、後者は抽象 (abstract) UNIX ドメインソケットの指定です。 diff --git a/doc/lxc.container.conf.sgml.in b/doc/lxc.container.conf.sgml.in index 0af0456a5a..b03cf851f2 100644 --- a/doc/lxc.container.conf.sgml.in +++ b/doc/lxc.container.conf.sgml.in @@ -1943,7 +1943,7 @@ dev/null proc/kcore none bind,relative 0 0 Specifying "errno" as action will cause LXC to register a seccomp filter - that will cause a specific errno to be returned ot the caller. The errno + that will cause a specific errno to be returned to the caller. The errno value can be specified after the "errno" action word. From c425edc6616aedb241465ae4593e7b8dc65c90c3 Mon Sep 17 00:00:00 2001 From: KATOH Yasufumi Date: Thu, 9 May 2019 15:24:18 +0900 Subject: [PATCH 2/2] doc: Fix and improve Japanese translation Signed-off-by: KATOH Yasufumi Reviewed-by: Hiroaki Nakamura --- doc/ja/lxc.container.conf.sgml.in | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/doc/ja/lxc.container.conf.sgml.in b/doc/ja/lxc.container.conf.sgml.in index 553a88ea36..3ea3402ff8 100644 --- a/doc/ja/lxc.container.conf.sgml.in +++ b/doc/ja/lxc.container.conf.sgml.in @@ -724,9 +724,13 @@ by KATOH Yasufumi ipvlan インターフェースは により指定されるインターフェースとリンクし、コンテナに割り当てられます。 でモードを指定すると、その ipvlan の指定を、同じ上位デバイスで異なる ipvlan 間の通信をする時に使います。 指定できるモードは で、デフォルトは モードです。 - モードでは、L3 までの TX (送信) 処理はスレーブデバイスにアタッチされたスタックインスタンス上で行われます。そしてパケットは、L2 処理のためにマスターデバイスのスタックインスタンスにスイッチされます。このインスタンスからのルーティングは、発信デバイス上でキューに入る前に使われます。このモードでは、スレーブはマルチキャスト・ブロードキャストのトラフィックを受信しませんし、受け取ることもできません。 - モードは、TX (送信) 処理は L3 モードと非常に似ていますが、iptables (conn-tracking) がこのモードで動作します。それゆえに L3対称 (symmetric) (L3s) です。このモードは若干パフォーマンスが低下しますが、conn-tracking (接続追跡) が動作するように、普通の L3 モードの代わりにこのモードを選んでいるので問題にはならないはずです。 - モードでは、TX (送信) 処理はスレーブデバイスにアタッチされたスタックインスンタンス上で行われます。パケットを送信するのに、マスターデバイスにスイッチされ、マスターデバイス上でキューに入ります。このモードでは、スレーブはマルチキャストも(該当する場合)ブロードキャストも RX/TX (送受信) 処理します。 + モードでは、L3 までの TX (送信) 処理はスレーブデバイスにアタッチされたスタックインスタンス上で行われます。 + そしてパケットは、L2 処理のためにマスターデバイスのスタックインスタンスにスイッチされます。このインスタンスからのルーティングは、発信デバイス上でキューに入る前に使われます。 + このモードでは、スレーブはマルチキャスト・ブロードキャストのトラフィックを受信しませんし、送信もできません。 + モードは、TX (送信) 処理は L3 モードと非常に似ていますが、iptables (conn-tracking) がこのモードで動作します。 + それゆえに L3対称 (symmetric) (L3s) です。このモードは若干パフォーマンスが低下しますが、conn-tracking (接続追跡) が動作するように、普通の L3 モードの代わりにこのモードを選んでいるので問題にはならないはずです。 + モードでは、TX (送信) 処理はスレーブデバイスにアタッチされたスタックインスンタンス上で行われます。 + パケットは送信のため、マスターデバイスにスイッチされ、マスターデバイス上でキューに入ります。このモードでは、スレーブはマルチキャストも(該当する場合)ブロードキャストも RX/TX (送受信) 処理します。 は隔離モードを指定します。隔離モードには が指定できます。デフォルトは モードです。 隔離モードでは、スレーブはマスターデバイス経由の通信とは別に、スレーブ同士で通信できます。 @@ -786,7 +790,7 @@ by KATOH Yasufumi - - レイヤ 2 IP 近隣プロキシエントリを、コンテナの IP アドレスに対応する lxc.net.[i].link インターフェースに追加するかどうかを制御します。0 か 1 が設定でき、デフォルトは 0 です。 - IPv4 アドレスで使う場合は、次の sysctl 設定が必要です: - net.ipv4.conf.[link].forwarding=1 - IPv6 アドレスで使う場合は、次の sysctl 設定が必要です: + --> + レイヤ 2 IP 近隣プロキシエントリを、コンテナの IP アドレスに対応する lxc.net.[i].link インターフェースに追加するかどうかを制御します。0 か 1 が設定でき、デフォルトは 0 です。 + IPv4 アドレスで使う場合は、次の sysctl 設定が必要です: + net.ipv4.conf.[link].forwarding=1 + IPv6 アドレスで使う場合は、次の sysctl 設定が必要です: net.ipv6.conf.[link].proxy_ndp=1 net.ipv6.conf.[link].forwarding=1 From noreply at github.com Thu May 9 09:09:47 2019 From: noreply at github.com (Christian Brauner) Date: Thu, 09 May 2019 02:09:47 -0700 Subject: [lxc-devel] [lxc/lxc] 7dd6ea: doc: Update Japanese lxc.container.conf(5) Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 7dd6ead90417723a3c9717b45728d701a36e9bff https://github.com/lxc/lxc/commit/7dd6ead90417723a3c9717b45728d701a36e9bff Author: KATOH Yasufumi Date: 2019-05-08 (Wed, 08 May 2019) Changed paths: M doc/ja/lxc.container.conf.sgml.in M doc/lxc.container.conf.sgml.in Log Message: ----------- doc: Update Japanese lxc.container.conf(5) This is the translation for the following description: - lxc.seccomp.notify.proxy (commit 8a64375) - host side veth device static routes (commit d4a7da4) - IPVLAN (commit c9f5238) - Layer 2 proxy mode (commit 6509154) - gateway device route mode (commit a2f9a67) and fix typo in English man page. Signed-off-by: KATOH Yasufumi Commit: c425edc6616aedb241465ae4593e7b8dc65c90c3 https://github.com/lxc/lxc/commit/c425edc6616aedb241465ae4593e7b8dc65c90c3 Author: KATOH Yasufumi Date: 2019-05-09 (Thu, 09 May 2019) Changed paths: M doc/ja/lxc.container.conf.sgml.in Log Message: ----------- doc: Fix and improve Japanese translation Signed-off-by: KATOH Yasufumi Reviewed-by: Hiroaki Nakamura Commit: c6494c4b885a86cdc115add6b6a48d1f5149b5e2 https://github.com/lxc/lxc/commit/c6494c4b885a86cdc115add6b6a48d1f5149b5e2 Author: Christian Brauner Date: 2019-05-09 (Thu, 09 May 2019) Changed paths: M doc/ja/lxc.container.conf.sgml.in M doc/lxc.container.conf.sgml.in Log Message: ----------- Merge pull request #2983 from tenforward/japanese Update Japanese lxc.container.conf(5) Compare: https://github.com/lxc/lxc/compare/b1045fd37bf5...c6494c4b885a From lxc-bot at linuxcontainers.org Thu May 9 09:47:41 2019 From: lxc-bot at linuxcontainers.org (lxc-jp on Github) Date: Thu, 09 May 2019 02:47:41 -0700 (PDT) Subject: [lxc-devel] [linuxcontainers.org/master] Update Japanese statements about LXC releases Message-ID: <5cd3f73d.1c69fb81.506f1.aaadSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 370 bytes Desc: not available URL: -------------- next part -------------- From 27229b4539323561eb12e183b8cc5ff8db498249 Mon Sep 17 00:00:00 2001 From: KATOH Yasufumi Date: Thu, 9 May 2019 18:45:38 +0900 Subject: [PATCH] Update Japanese statements about LXC releases Signed-off-by: KATOH Yasufumi --- content/index.ja.html | 7 +++---- content/lxc/downloads.ja.md | 7 ++++--- content/lxc/introduction.ja.md | 12 ++++++++---- 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/content/index.ja.html b/content/index.ja.html index 0a416c3..7d030df 100644 --- a/content/index.ja.html +++ b/content/index.ja.html @@ -54,11 +54,10 @@

LXC

- LXC は production ready であり、LXC 1.0 が 5 年間のセキュリティアップデートとバグ修正を提供します (2019 年 4 月まで)。 + LXC は production ready であり、LTS リリースが 5 年間のセキュリティアップデートとバグ修正を提供します。

詳しく見る

diff --git a/content/lxc/downloads.ja.md b/content/lxc/downloads.ja.md index b2ce5a3..16a6afe 100644 --- a/content/lxc/downloads.ja.md +++ b/content/lxc/downloads.ja.md @@ -16,10 +16,11 @@ You may want to look for that, especially if your distribution doesn't include L LXC 1.0 や 2.0 がディストリビューションの stable リリースに含まれない場合は特に、それを使うことも選択肢の一つでしょう。 -Production 環境では、長期サポート版の stable リリースである LXC 1.0.x もしくは 2.0.x を使い続けることをお勧めします。1.0.x は 2019 年 4 月まで、2.0.x は 2021 年 4 月までサポートします。 +Production 環境では、長期サポート版の stable リリースである LXC 1.0.x もしくは 2.0.x もしくは 3.0.x を使い続けることをお勧めします。それぞれ 2019 年 6 月(1.0.x)、2021 年 6 月(2.0.x)、2023 年 6 月(3.0.x)までサポートします。 -LXC 1.0 と 2.0 は長期サポート版のリリースです。 -LXC 1.0 は 2019 年 6 月 1 日までサポートされます。そして LXC 2.0 は 2021 年 6 月 1 日までサポートされます。 +LXC 1.0、2.0、3.0 は長期サポート版のリリースです。 + - LXC 1.0 は 2019 年 6 月 1 日までサポートされます + - LXC 2.0 は 2021 年 6 月 1 日までサポートされます + - LXC 3.0 は 2023 年 6 月 1 日までサポートされます Sign in Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network: physical and macvlan nictype MTU support #5745

Draft
wants to merge 3 commits into
base: master
from

Conversation

1 participant
@tomponline
Copy link
Member

commented May 9, 2019

This exposes the network_phys_macvlan_mtu support in LXC for setting a boot time custom MTU on physical and macvlan nic device types and updates tests to expect the MTU to be changed if that feature is present.

TODO: Currently if a physical device's MTU is changed at boot time and then hot-unplugged, the physical device does not get its MTU restored to what it was on the host before being moved into the container.

tomponline added some commits May 9, 2019

api: Exposes LXC network_phys_macvlan_mtu feature
Signed-off-by: Thomas Parrott <thomas.parrott at canonical.com>
test: Updates macvlan tests to detect MTU support in LXC
Signed-off-by: Thomas Parrott <thomas.parrott at canonical.com>
test: Updates physical tests to detect MTU support in LXC
Signed-off-by: Thomas Parrott <thomas.parrott at canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
From lxc-bot at linuxcontainers.org Thu May 9 17:43:02 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Thu, 09 May 2019 10:43:02 -0700 (PDT) Subject: [lxc-devel] [lxc/master] start: use CLONE_PIDFD Message-ID: <5cd466a6.1c69fb81.e9f36.20b7SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 610 bytes Desc: not available URL: -------------- next part -------------- From 33258b95fc1573b68b3dfae7a1d41696293b828d Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 9 May 2019 17:09:51 +0200 Subject: [PATCH 1/2] namespace: support CLONE_PIDFD with lxc_clone() Signed-off-by: Christian Brauner --- src/lxc/conf.c | 2 +- src/lxc/namespace.c | 6 +++--- src/lxc/namespace.h | 2 +- src/lxc/start.c | 2 +- src/lxc/storage/nbd.c | 2 +- src/lxc/tools/lxc_unshare.c | 2 +- 6 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/lxc/conf.c b/src/lxc/conf.c index 2515c881ef..0fbbbfa797 100644 --- a/src/lxc/conf.c +++ b/src/lxc/conf.c @@ -4419,7 +4419,7 @@ int userns_exec_full(struct lxc_conf *conf, int (*fn)(void *), void *data, d.p[1] = p[1]; /* Clone child in new user namespace. */ - pid = lxc_clone(run_userns_fn, &d, CLONE_NEWUSER); + pid = lxc_clone(run_userns_fn, &d, CLONE_NEWUSER, NULL); if (pid < 0) { ERROR("Failed to clone process in new user namespace"); goto on_error; diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c index e22d9a4bf0..59fba412dd 100644 --- a/src/lxc/namespace.c +++ b/src/lxc/namespace.c @@ -54,7 +54,7 @@ static int do_clone(void *arg) } #define __LXC_STACK_SIZE 4096 -pid_t lxc_clone(int (*fn)(void *), void *arg, int flags) +pid_t lxc_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) { size_t stack_size; pid_t ret; @@ -66,9 +66,9 @@ pid_t lxc_clone(int (*fn)(void *), void *arg, int flags) stack_size = __LXC_STACK_SIZE; #ifdef __ia64__ - ret = __clone2(do_clone, stack, stack_size, flags | SIGCHLD, &clone_arg); + ret = __clone2(do_clone, stack, stack_size, flags | SIGCHLD, &clone_arg, pidfd); #else - ret = clone(do_clone, stack + stack_size, flags | SIGCHLD, &clone_arg); + ret = clone(do_clone, stack + stack_size, flags | SIGCHLD, &clone_arg, pidfd); #endif if (ret < 0) SYSERROR("Failed to clone (%#x)", flags); diff --git a/src/lxc/namespace.h b/src/lxc/namespace.h index ab583da76a..f2c2ad82c6 100644 --- a/src/lxc/namespace.h +++ b/src/lxc/namespace.h @@ -133,7 +133,7 @@ int clone(int (*fn)(void *), void *child_stack, * - should call lxc_raw_getpid(): * The child should use lxc_raw_getpid() to retrieve its pid. */ -extern pid_t lxc_clone(int (*fn)(void *), void *arg, int flags); +extern pid_t lxc_clone(int (*fn)(void *), void *arg, int flags, int *pidfd); extern int lxc_namespace_2_cloneflag(const char *namespace); extern int lxc_namespace_2_ns_idx(const char *namespace); diff --git a/src/lxc/start.c b/src/lxc/start.c index 34798292cf..48ba2b4240 100644 --- a/src/lxc/start.c +++ b/src/lxc/start.c @@ -1735,7 +1735,7 @@ static int lxc_spawn(struct lxc_handler *handler) pid_t attacher_pid; attacher_pid = lxc_clone(do_share_ns, handler, - CLONE_VFORK | CLONE_VM | CLONE_FILES); + CLONE_VFORK | CLONE_VM | CLONE_FILES, NULL); if (attacher_pid < 0) { SYSERROR(LXC_CLONE_ERROR); goto out_delete_net; diff --git a/src/lxc/storage/nbd.c b/src/lxc/storage/nbd.c index ab4f752c9d..dc68ee623e 100644 --- a/src/lxc/storage/nbd.c +++ b/src/lxc/storage/nbd.c @@ -266,7 +266,7 @@ static bool clone_attach_nbd(const char *nbd, const char *path) data.nbd = nbd; data.path = path; - pid = lxc_clone(do_attach_nbd, &data, CLONE_NEWPID); + pid = lxc_clone(do_attach_nbd, &data, CLONE_NEWPID, NULL); if (pid < 0) return false; diff --git a/src/lxc/tools/lxc_unshare.c b/src/lxc/tools/lxc_unshare.c index 1bc04ce928..421d92c2ad 100644 --- a/src/lxc/tools/lxc_unshare.c +++ b/src/lxc/tools/lxc_unshare.c @@ -388,7 +388,7 @@ int main(int argc, char *argv[]) start_arg.want_hostname = my_args.want_hostname; start_arg.want_default_mounts = my_args.want_default_mounts; - pid = lxc_clone(do_start, &start_arg, my_args.flags); + pid = lxc_clone(do_start, &start_arg, my_args.flags, NULL); if (pid < 0) { ERROR("Failed to clone"); free_ifname_list(); From 33942046c58fe0d4fb74e4fe2896b03f2cb26898 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 9 May 2019 19:40:23 +0200 Subject: [PATCH 2/2] start: use CLONE_PIDFD Use CLONE_PIDFD when possible. Note the clone() syscall ignores unknown flags which is usually a design mistake. However, for us this bug is a feature since we can just pass the flag along and see whether the kernel has given us a pidfd. Signed-off-by: Christian Brauner --- src/lxc/start.c | 49 ++++++++++++++++++++++++++++++++++--------------- src/lxc/start.h | 3 +++ 2 files changed, 37 insertions(+), 15 deletions(-) diff --git a/src/lxc/start.c b/src/lxc/start.c index 48ba2b4240..51969697e7 100644 --- a/src/lxc/start.c +++ b/src/lxc/start.c @@ -406,7 +406,9 @@ static int signal_handler(int fd, uint32_t events, void *data, } if (siginfo.ssi_signo == SIGHUP) { - if (hdlr->proc_pidfd >= 0) + if (hdlr->pidfd >= 0) + lxc_raw_pidfd_send_signal(hdlr->pidfd, SIGTERM, NULL, 0); + else if (hdlr->proc_pidfd >= 0) lxc_raw_pidfd_send_signal(hdlr->proc_pidfd, SIGTERM, NULL, 0); else kill(hdlr->pid, SIGTERM); @@ -416,7 +418,10 @@ static int signal_handler(int fd, uint32_t events, void *data, } if (siginfo.ssi_signo != SIGCHLD) { - if (hdlr->proc_pidfd >= 0) + if (hdlr->pidfd >= 0) + lxc_raw_pidfd_send_signal(hdlr->pidfd, + siginfo.ssi_signo, NULL, 0); + else if (hdlr->proc_pidfd >= 0) lxc_raw_pidfd_send_signal(hdlr->proc_pidfd, siginfo.ssi_signo, NULL, 0); else @@ -665,6 +670,8 @@ void lxc_zero_handler(struct lxc_handler *handler) handler->pinfd = -1; + handler->pidfd = -EBADF; + handler->proc_pidfd = -EBADF; handler->sigfd = -1; @@ -687,6 +694,9 @@ void lxc_free_handler(struct lxc_handler *handler) if (handler->pinfd >= 0) close(handler->pinfd); + if (handler->pidfd >= 0) + close(handler->pidfd); + if (handler->proc_pidfd >= 0) close(handler->proc_pidfd); @@ -734,6 +744,7 @@ struct lxc_handler *lxc_init_handler(const char *name, struct lxc_conf *conf, handler->conf = conf; handler->lxcpath = lxcpath; handler->pinfd = -1; + handler->pidfd = -EBADF; handler->proc_pidfd = -EBADF; handler->sigfd = -EBADF; handler->init_died = false; @@ -1096,19 +1107,23 @@ void lxc_fini(const char *name, struct lxc_handler *handler) void lxc_abort(const char *name, struct lxc_handler *handler) { - int ret, status; + int ret = 0; + int status; lxc_set_state(name, handler, ABORTING); - if (handler->pid > 0) { + if (handler->pidfd > 0) + ret = lxc_raw_pidfd_send_signal(handler->pidfd, SIGKILL, NULL, 0); + else if (handler->proc_pidfd > 0) ret = lxc_raw_pidfd_send_signal(handler->proc_pidfd, SIGKILL, NULL, 0); - if (ret < 0) - SYSERROR("Failed to send SIGKILL to %d", handler->pid); - } + else if (handler->pid > 0) + ret = kill(handler->pid, SIGKILL); + if (ret < 0) + SYSERROR("Failed to send SIGKILL to %d", handler->pid); - while ((ret = waitpid(-1, &status, 0)) > 0) { - ; - } + do { + ret = waitpid(-1, &status, 0); + } while (ret > 0); } static int do_start(void *data) @@ -1601,7 +1616,8 @@ static inline int do_share_ns(void *arg) flags = handler->ns_on_clone_flags; flags |= CLONE_PARENT; - handler->pid = lxc_raw_clone_cb(do_start, handler, flags, NULL); + handler->pid = lxc_raw_clone_cb(do_start, handler, CLONE_PIDFD | flags, + &handler->pidfd); if (handler->pid < 0) return -1; @@ -1748,7 +1764,8 @@ static int lxc_spawn(struct lxc_handler *handler) } } else { handler->pid = lxc_raw_clone_cb(do_start, handler, - handler->ns_on_clone_flags, NULL); + CLONE_PIDFD | handler->ns_on_clone_flags, + &handler->pidfd); } if (handler->pid < 0) { SYSERROR(LXC_CLONE_ERROR); @@ -1756,9 +1773,11 @@ static int lxc_spawn(struct lxc_handler *handler) } TRACE("Cloned child process %d", handler->pid); - handler->proc_pidfd = proc_pidfd_open(handler->pid); - if (handler->proc_pidfd < 0 && (errno != ENOSYS)) - goto out_delete_net; + if (handler->pidfd < 0) { + handler->proc_pidfd = proc_pidfd_open(handler->pid); + if (handler->proc_pidfd < 0 && (errno != ENOSYS)) + goto out_delete_net; + } for (i = 0; i < LXC_NS_MAX; i++) if (handler->ns_on_clone_flags & ns_info[i].clone_flag) diff --git a/src/lxc/start.h b/src/lxc/start.h index 305782f272..a3a5b4d540 100644 --- a/src/lxc/start.h +++ b/src/lxc/start.h @@ -102,6 +102,9 @@ struct lxc_handler { /* The child's pid. */ pid_t pid; + /* The child's pidfd. */ + int pidfd; + /* * File descriptor for the /proc/ directory of the container's * init process. From lxc-bot at linuxcontainers.org Thu May 9 18:24:11 2019 From: lxc-bot at linuxcontainers.org (tych0 on Github) Date: Thu, 09 May 2019 11:24:11 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Pass zero to clone Message-ID: <5cd4704b.1c69fb81.fae64.0c13SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 386 bytes Desc: not available URL: -------------- next part -------------- From ae15df583f47177215b03005d62c85e7732ac9bd Mon Sep 17 00:00:00 2001 From: Tycho Andersen Date: Thu, 9 May 2019 13:52:30 -0400 Subject: [PATCH 1/3] lxc_clone: pass 0 as stack and have the kernel allocate it The kernel allows us to pass a NULL stack and have it allocate one, so let's just do that instead of doing it manually, since there are two problems with this code: 1. The math is wrong. We allocate a char *foo[__LXC_STACK_SIZE]; which means it's really sizeof(char *) * __LXC_STACK_SIZE, instead of just __LXC_STACK SIZE. 2. We can't actually allocate it on our stack. When we use CLONE_VM (which we do in the shared ns case) that means that the new thread is just running one page lower on the stack, but anything that allocates a page on the stack may clobber data. This is a pretty short race window since we just do the shared ns stuff and then do a clone without CLONE_VM. However, it does point out an interesting possible privilege escalation if things aren't configured correctly: do_share_ns() sets up namespaces while it shares the address space of the task that spawned it; once it enters the pid ns of the thing it's sharing with, the thing it's sharing with can ptrace it and write stuff into the host's address space. Since the function that does the clone() is lxc_spawn(), it has a struct cgroup_ops* on the stack, which itself has function pointers called later in the function, so it's possible to allocate shellcode in the address space of the host and run it fairly easily. ASLR doesn't mitigate this since we know exactly the stack offsets; however this patch has the kernel allocate a new stack, which will help. Of course, the attacker could just check /proc/pid/maps to find the location of the stack, but they'd still have to guess where to write stuff in. The thing that does prevent this is the default configuration of apparmor. Since the apparmor profile is set in the second clone, and apparmor prevents ptracing things under a different profile, attackers confined by apparmor can't do this. However, if users are using a custom configuration with shared namespaces, care must be taken to avoid this race. Shared namespaces aren't widely used now, so perhaps this isn't a problem, but with the advent of crio-lxc for k8s, this functionality will be used more. Signed-off-by: Tycho Andersen --- src/lxc/namespace.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c index e22d9a4bf0..04b7dd3d2d 100644 --- a/src/lxc/namespace.c +++ b/src/lxc/namespace.c @@ -53,22 +53,18 @@ static int do_clone(void *arg) return clone_arg->fn(clone_arg->arg); } -#define __LXC_STACK_SIZE 4096 pid_t lxc_clone(int (*fn)(void *), void *arg, int flags) { - size_t stack_size; pid_t ret; struct clone_arg clone_arg = { .fn = fn, .arg = arg, }; - char *stack[__LXC_STACK_SIZE] = {0}; - stack_size = __LXC_STACK_SIZE; #ifdef __ia64__ - ret = __clone2(do_clone, stack, stack_size, flags | SIGCHLD, &clone_arg); + ret = __clone2(do_clone, NULL, 0, flags | SIGCHLD, &clone_arg); #else - ret = clone(do_clone, stack + stack_size, flags | SIGCHLD, &clone_arg); + ret = clone(do_clone, 0, flags | SIGCHLD, &clone_arg); #endif if (ret < 0) SYSERROR("Failed to clone (%#x)", flags); From b827d5ed02ce861f52f5703cadc6fbf50129f041 Mon Sep 17 00:00:00 2001 From: Tycho Andersen Date: Thu, 9 May 2019 14:13:40 -0400 Subject: [PATCH 2/3] doc: add a little note about shared ns + LSMs We should add a little not about the race in the previous patch. Signed-off-by: Tycho Andersen --- doc/lxc.container.conf.sgml.in | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/doc/lxc.container.conf.sgml.in b/doc/lxc.container.conf.sgml.in index ee78e49a3d..8247e03487 100644 --- a/doc/lxc.container.conf.sgml.in +++ b/doc/lxc.container.conf.sgml.in @@ -1657,6 +1657,12 @@ dev/null proc/kcore none bind,relative 0 0 process wants to inherit the other's network namespace it usually needs to inherit the user namespace as well. + + + Note that without careful additional configuration of an LSM, + sharing user+pid namespaces with a task may allow that task to + escalate privileges to that of the task calling liblxc. +
From 0f407cd4367e220bf89c41c18c6995fe13ed50bf Mon Sep 17 00:00:00 2001 From: Tycho Andersen Date: Thu, 9 May 2019 14:18:10 -0400 Subject: [PATCH 3/3] lxc_clone: get rid of some indirection We have a do_clone(), which just calls a void f(void *) that it gets passed. We build up a struct consisting of two args that are just the actual arg and actual function. Let's just have the syscall do this for us. Signed-off-by: Tycho Andersen --- src/lxc/namespace.c | 19 ++----------------- 1 file changed, 2 insertions(+), 17 deletions(-) diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c index 04b7dd3d2d..e5d6836893 100644 --- a/src/lxc/namespace.c +++ b/src/lxc/namespace.c @@ -42,29 +42,14 @@ lxc_log_define(namespace, lxc); -struct clone_arg { - int (*fn)(void *); - void *arg; -}; - -static int do_clone(void *arg) -{ - struct clone_arg *clone_arg = arg; - return clone_arg->fn(clone_arg->arg); -} - pid_t lxc_clone(int (*fn)(void *), void *arg, int flags) { pid_t ret; - struct clone_arg clone_arg = { - .fn = fn, - .arg = arg, - }; #ifdef __ia64__ - ret = __clone2(do_clone, NULL, 0, flags | SIGCHLD, &clone_arg); + ret = __clone2(fn, NULL, 0, flags | SIGCHLD, arg); #else - ret = clone(do_clone, 0, flags | SIGCHLD, &clone_arg); + ret = clone(fn, 0, flags | SIGCHLD, arg); #endif if (ret < 0) SYSERROR("Failed to clone (%#x)", flags); From lxc-bot at linuxcontainers.org Thu May 9 19:15:30 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Thu, 09 May 2019 12:15:30 -0700 (PDT) Subject: [lxc-devel] [lxd/master] Improve performance of setting volatile keys Message-ID: <5cd47c52.1c69fb81.abedc.af2aSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From 3f024abeae722ffc2fe1f62fed15a4ef1d45e526 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Thu, 9 May 2019 15:13:56 -0400 Subject: [PATCH 1/2] lxd/db: Introduce ContainerConfigUpdate MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/db/containers.go | 48 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/lxd/db/containers.go b/lxd/db/containers.go index 152df5d2df..8b453e7225 100644 --- a/lxd/db/containers.go +++ b/lxd/db/containers.go @@ -524,6 +524,54 @@ func (c *ClusterTx) ContainerConfigInsert(id int, config map[string]string) erro return ContainerConfigInsert(c.tx, id, config) } +// ContainerConfigUpdate inserts/updates/deletes the provided keys +func (c *ClusterTx) ContainerConfigUpdate(id int, values map[string]string) error { + changes := map[string]string{} + deletes := []string{} + + // Figure out which key to set/unset + for key, value := range values { + if value == "" { + deletes = append(deletes, key) + continue + } + changes[key] = value + } + + // Insert/update keys + if len(changes) > 0 { + query := fmt.Sprintf("INSERT OR REPLACE INTO containers_config (container_id, key, value) VALUES") + exprs := []string{} + params := []interface{}{} + for key, value := range changes { + exprs = append(exprs, "(?, ?, ?)") + params = append(params, []interface{}{id, key, value}...) + } + + query += strings.Join(exprs, ",") + _, err := c.tx.Exec(query, params...) + if err != nil { + return err + } + } + + // Delete keys + if len(deletes) > 0 { + query := fmt.Sprintf("DELETE FROM containers_config WHERE key IN %s AND container_id=?", query.Params(len(deletes))) + params := []interface{}{} + for _, key := range deletes { + params = append(params, key) + } + + _, err := c.tx.Exec(query, params...) + if err != nil { + return err + } + } + + return nil +} + // ContainerRemove removes the container with the given name from the database. func (c *Cluster) ContainerRemove(project, name string) error { return c.Transaction(func(tx *ClusterTx) error { From d89d1f20bc4ef969fffd096d0f86afe89511678e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Thu, 9 May 2019 15:14:23 -0400 Subject: [PATCH 2/2] lxd/containers: Replace ConfigKeySet with VolatileSet MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/container.go | 4 ++-- lxd/container_lxc.go | 49 +++++++++++++++++++++++++------------------- lxd/containers.go | 4 ++-- lxd/storage.go | 2 +- 4 files changed, 33 insertions(+), 26 deletions(-) diff --git a/lxd/container.go b/lxd/container.go index 1a6a1b4532..7295d06917 100644 --- a/lxd/container.go +++ b/lxd/container.go @@ -637,7 +637,7 @@ type container interface { // Live configuration CGroupGet(key string) (string, error) CGroupSet(key string, value string) error - ConfigKeySet(key string, value string) error + VolatileSet(changes map[string]string) error // File handling FileExists(path string) error @@ -1349,7 +1349,7 @@ func containerConfigureInternal(c container) error { if rootDiskDevice["size"] != "" { storageTypeName := storage.GetStorageTypeName() if (storageTypeName == "lvm" || storageTypeName == "ceph") && c.IsRunning() { - err = c.ConfigKeySet("volatile.apply_quota", rootDiskDevice["size"]) + err = c.VolatileSet(map[string]string{"volatile.apply_quota": rootDiskDevice["size"]}) if err != nil { return err } diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 1a65e9c47a..481cec94d7 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -456,14 +456,14 @@ func containerLXCCreate(s *state.State, args db.ContainerArgs) (container, error jsonIdmap = "[]" } - err = c.ConfigKeySet("volatile.idmap.next", jsonIdmap) + err = c.VolatileSet(map[string]string{"volatile.idmap.next": jsonIdmap}) if err != nil { c.Delete() logger.Error("Failed creating container", ctxMap) return nil, err } - err = c.ConfigKeySet("volatile.idmap.base", fmt.Sprintf("%v", base)) + err = c.VolatileSet(map[string]string{"volatile.idmap.base": fmt.Sprintf("%v", base)}) if err != nil { c.Delete() logger.Error("Failed creating container", ctxMap) @@ -475,7 +475,7 @@ func containerLXCCreate(s *state.State, args db.ContainerArgs) (container, error // Set last_state if not currently set if c.localConfig["volatile.last_state.idmap"] == "" { - err = c.ConfigKeySet("volatile.last_state.idmap", "[]") + err = c.VolatileSet(map[string]string{"volatile.last_state.idmap": "[]"}) if err != nil { c.Delete() logger.Error("Failed creating container", ctxMap) @@ -2158,7 +2158,7 @@ func (c *containerLXC) startCommon() (string, error) { jsonDiskIdmap = string(idmapBytes) } - err = c.ConfigKeySet("volatile.last_state.idmap", jsonDiskIdmap) + err = c.VolatileSet(map[string]string{"volatile.last_state.idmap": jsonDiskIdmap}) if err != nil { return "", errors.Wrapf(err, "Set volatile.last_state.idmap config key on container %q (id %d)", c.name, c.id) } @@ -2177,7 +2177,7 @@ func (c *containerLXC) startCommon() (string, error) { } if c.localConfig["volatile.idmap.current"] != string(idmapBytes) { - err = c.ConfigKeySet("volatile.idmap.current", string(idmapBytes)) + err = c.VolatileSet(map[string]string{"volatile.idmap.current": string(idmapBytes)}) if err != nil { return "", errors.Wrapf(err, "Set volatile.idmap.current config key on container %q (id %d)", c.name, c.id) } @@ -3965,26 +3965,33 @@ func (c *containerLXC) CGroupSet(key string, value string) error { return nil } -func (c *containerLXC) ConfigKeySet(key string, value string) error { - c.localConfig[key] = value - - args := db.ContainerArgs{ - Architecture: c.architecture, - Config: c.localConfig, - Description: c.description, - Devices: c.localDevices, - Ephemeral: c.ephemeral, - Profiles: c.profiles, - Project: c.project, - ExpiryDate: c.expiryDate, +func (c *containerLXC) VolatileSet(changes map[string]string) error { + // Sanity check + for key := range changes { + if !strings.HasPrefix(key, "volatile.") { + return fmt.Errorf("Only volatile keys can be modified with VolatileSet") + } } - err := c.Update(args, false) + // Update the database + err := c.state.Cluster.Transaction(func(tx *db.ClusterTx) error { + return tx.ContainerConfigUpdate(c.id, changes) + }) if err != nil { - errors.Wrap(err, "Failed to update container") + return errors.Wrap(err, "Failed to update database") } - return err + // Apply the change locally + for key, value := range changes { + if value == "" { + delete(c.localConfig, key) + continue + } + + c.localConfig[key] = value + } + + return nil } type backupFile struct { @@ -5730,7 +5737,7 @@ func (c *containerLXC) TemplateApply(trigger string) error { // "create" and "copy" are deferred until next start if shared.StringInSlice(trigger, []string{"create", "copy"}) { // The two events are mutually exclusive so only keep the last one - err := c.ConfigKeySet("volatile.apply_template", trigger) + err := c.VolatileSet(map[string]string{"volatile.apply_template": trigger}) if err != nil { return errors.Wrap(err, "Failed to set apply_template volatile key") } diff --git a/lxd/containers.go b/lxd/containers.go index 49625cfd0b..4f541ba0dd 100644 --- a/lxd/containers.go +++ b/lxd/containers.go @@ -290,12 +290,12 @@ func containersShutdown(s *state.State) error { go func(c container, lastState string) { c.Shutdown(time.Second * time.Duration(timeoutSeconds)) c.Stop(false) - c.ConfigKeySet("volatile.last_state.power", lastState) + c.VolatileSet(map[string]string{"volatile.last_state.power": lastState}) wg.Done() }(c, lastState) } else { - c.ConfigKeySet("volatile.last_state.power", lastState) + c.VolatileSet(map[string]string{"volatile.last_state.power": lastState}) } } wg.Wait() diff --git a/lxd/storage.go b/lxd/storage.go index 2e07d53039..f8c7e70c45 100644 --- a/lxd/storage.go +++ b/lxd/storage.go @@ -777,7 +777,7 @@ func resetContainerDiskIdmap(container container, srcIdmap *idmap.IdmapSet) erro jsonIdmap = "[]" } - err := container.ConfigKeySet("volatile.last_state.idmap", jsonIdmap) + err := container.VolatileSet(map[string]string{"volatile.last_state.idmap": jsonIdmap}) if err != nil { return err } From noreply at github.com Thu May 9 19:20:00 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Thu, 09 May 2019 12:20:00 -0700 Subject: [lxc-devel] [lxc/lxc] 33258b: namespace: support CLONE_PIDFD with lxc_clone() Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 33258b95fc1573b68b3dfae7a1d41696293b828d https://github.com/lxc/lxc/commit/33258b95fc1573b68b3dfae7a1d41696293b828d Author: Christian Brauner Date: 2019-05-09 (Thu, 09 May 2019) Changed paths: M src/lxc/conf.c M src/lxc/namespace.c M src/lxc/namespace.h M src/lxc/start.c M src/lxc/storage/nbd.c M src/lxc/tools/lxc_unshare.c Log Message: ----------- namespace: support CLONE_PIDFD with lxc_clone() Signed-off-by: Christian Brauner Commit: 33942046c58fe0d4fb74e4fe2896b03f2cb26898 https://github.com/lxc/lxc/commit/33942046c58fe0d4fb74e4fe2896b03f2cb26898 Author: Christian Brauner Date: 2019-05-09 (Thu, 09 May 2019) Changed paths: M src/lxc/start.c M src/lxc/start.h Log Message: ----------- start: use CLONE_PIDFD Use CLONE_PIDFD when possible. Note the clone() syscall ignores unknown flags which is usually a design mistake. However, for us this bug is a feature since we can just pass the flag along and see whether the kernel has given us a pidfd. Signed-off-by: Christian Brauner Commit: 3e860bdac0e8b9cd5e4a06546c85e3fddc7781cf https://github.com/lxc/lxc/commit/3e860bdac0e8b9cd5e4a06546c85e3fddc7781cf Author: Stéphane Graber Date: 2019-05-09 (Thu, 09 May 2019) Changed paths: M src/lxc/conf.c M src/lxc/namespace.c M src/lxc/namespace.h M src/lxc/start.c M src/lxc/start.h M src/lxc/storage/nbd.c M src/lxc/tools/lxc_unshare.c Log Message: ----------- Merge pull request #2986 from brauner/2019-05-09/clone_pidfd start: use CLONE_PIDFD Compare: https://github.com/lxc/lxc/compare/1ab73d38009d...3e860bdac0e8 From sreeginsree5298 at gmail.com Fri May 10 05:37:23 2019 From: sreeginsree5298 at gmail.com (Sreejin K) Date: Fri, 10 May 2019 11:07:23 +0530 Subject: [lxc-devel] Checking isolation of container Message-ID: Hi, I would like to know is there method to check the isolation of lxc container. What I have done ----_------------------------------------------------- I have installed LXC engine on my Ubuntu machine,Then i created two Ubuntu container on my host os(Ubuntu).Now I want to check whether these two container s are isolated.Is there any method to check this?.please support Thanks & Regards Sreegin k -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxc-bot at linuxcontainers.org Fri May 10 05:39:22 2019 From: lxc-bot at linuxcontainers.org (Rachid-Koucha on Github) Date: Thu, 09 May 2019 22:39:22 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Redirect error messages to stderr Message-ID: <5cd50e8a.1c69fb81.a93b8.bf42SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 482 bytes Desc: not available URL: -------------- next part -------------- From 634ad9358e7f43bf87672c51db032cde5e3142fd Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Fri, 10 May 2019 07:39:03 +0200 Subject: [PATCH] Redirect error messages to stderr Some error messages were not redirected to stderr. Moreover, do "exit 0" instead of "exit 1" when "help" option is passed. Signed-off-by: Rachid Koucha --- templates/lxc-busybox.in | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/templates/lxc-busybox.in b/templates/lxc-busybox.in index 7f93ee4077..3601655036 100644 --- a/templates/lxc-busybox.in +++ b/templates/lxc-busybox.in @@ -185,7 +185,7 @@ configure_busybox() # copy busybox in the rootfs if ! cp "${BUSYBOX_EXE}" "${rootfs}/bin"; then - echo "ERROR: Failed to copy busybox binary" + echo "ERROR: Failed to copy busybox binary" 1>&2 return 1 fi @@ -287,7 +287,7 @@ eval set -- "$options" while true do case "$1" in - -h|--help) usage && exit 1;; + -h|--help) usage && exit 0;; -n|--name) name=$2; shift 2;; -p|--path) path=$2; shift 2;; --rootfs) rootfs=$2; shift 2;; @@ -307,7 +307,7 @@ fi # Make sure busybox is present BUSYBOX_EXE=`which busybox` if [ $? -ne 0 ]; then - echo "ERROR: Failed to find busybox binary" + echo "ERROR: Failed to find busybox binary" 1>&2 exit 1 fi @@ -322,21 +322,21 @@ if [ -z "$rootfs" ]; then fi if ! install_busybox "${rootfs}" "${name}"; then - echo "ERROR: Failed to install rootfs" + echo "ERROR: Failed to install rootfs" 1>&2 exit 1 fi if ! configure_busybox "${rootfs}"; then - echo "ERROR: Failed to configure busybox" + echo "ERROR: Failed to configure busybox" 1>&2 exit 1 fi if ! copy_configuration "${path}" "${rootfs}" "${name}"; then - echo "ERROR: Failed to write config file" + echo "ERROR: Failed to write config file" 1>&2 exit 1 fi if ! remap_userns "${path}"; then - echo "ERROR: Failed to change idmappings" + echo "ERROR: Failed to change idmappings" 1>&2 exit 1 fi From noreply at github.com Fri May 10 06:49:01 2019 From: noreply at github.com (Christian Brauner) Date: Thu, 09 May 2019 23:49:01 -0700 Subject: [lxc-devel] [lxc/lxc] 634ad9: Redirect error messages to stderr Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 634ad9358e7f43bf87672c51db032cde5e3142fd https://github.com/lxc/lxc/commit/634ad9358e7f43bf87672c51db032cde5e3142fd Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Redirect error messages to stderr Some error messages were not redirected to stderr. Moreover, do "exit 0" instead of "exit 1" when "help" option is passed. Signed-off-by: Rachid Koucha Commit: 70aa3c7f58ab52ebaec807cb3597560df4f131f6 https://github.com/lxc/lxc/commit/70aa3c7f58ab52ebaec807cb3597560df4f131f6 Author: Christian Brauner Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Merge pull request #2989 from Rachid-Koucha/patch-8 Redirect error messages to stderr Compare: https://github.com/lxc/lxc/compare/3e860bdac0e8...70aa3c7f58ab From noreply at github.com Fri May 10 07:30:37 2019 From: noreply at github.com (Christian Brauner) Date: Fri, 10 May 2019 00:30:37 -0700 Subject: [lxc-devel] [lxc/lxc] 3bef7b: network: Adds mtu support for phys and macvlan types Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 3bef7b7b500e4750c97f3c0b2c62738a6f818011 https://github.com/lxc/lxc/commit/3bef7b7b500e4750c97f3c0b2c62738a6f818011 Author: Thomas Parrott Date: 2019-05-09 (Thu, 09 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Adds mtu support for phys and macvlan types Signed-off-by: Thomas Parrott Commit: 0b15498976115f4297ba77f94499fc4b8190abf8 https://github.com/lxc/lxc/commit/0b15498976115f4297ba77f94499fc4b8190abf8 Author: Thomas Parrott Date: 2019-05-09 (Thu, 09 May 2019) Changed paths: M src/lxc/network.c M src/lxc/network.h Log Message: ----------- network: Restores phys device MTU on container shutdown The phys devices will now have their original MTUs recorded at start and restored at shutdown. This is to protect the original phys device from having any container level MTU customisation being applied to the device once it is restored to the host. Signed-off-by: Thomas Parrott Commit: bc999107589c9246ecfc831d74855244aafc6d41 https://github.com/lxc/lxc/commit/bc999107589c9246ecfc831d74855244aafc6d41 Author: Thomas Parrott Date: 2019-05-09 (Thu, 09 May 2019) Changed paths: M doc/api-extensions.md M src/lxc/api_extensions.h Log Message: ----------- api: Adds the network_phys_macvlan_mtu extension This will allow LXD to check for custom MTU support for phys and macvlan devices. Signed-off-by: Thomas Parrott Commit: 9e195036412a99a2f48731156c61cc85054b37ee https://github.com/lxc/lxc/commit/9e195036412a99a2f48731156c61cc85054b37ee Author: Christian Brauner Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M doc/api-extensions.md M src/lxc/api_extensions.h M src/lxc/network.c M src/lxc/network.h Log Message: ----------- Merge pull request #2985 from tomponline/tp-mtu network: Adds mtu support for phys and macvlan types Compare: https://github.com/lxc/lxc/compare/70aa3c7f58ab...9e195036412a From lxc-bot at linuxcontainers.org Fri May 10 07:57:20 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Fri, 10 May 2019 00:57:20 -0700 (PDT) Subject: [lxc-devel] [lxd/master] IPVLAN cleanup Message-ID: <5cd52ee0.1c69fb81.143be.e844SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 435 bytes Desc: not available URL: -------------- next part -------------- From c1f7f15ee30b198fffa6d44f6370d334746119b2 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Fri, 10 May 2019 08:47:29 +0100 Subject: [PATCH 1/2] test: ipvlan test activates ipv4 forwarding Signed-off-by: Thomas Parrott --- test/suites/container_devices_nic_ipvlan.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/test/suites/container_devices_nic_ipvlan.sh b/test/suites/container_devices_nic_ipvlan.sh index 2e7d98224f..737bd92005 100644 --- a/test/suites/container_devices_nic_ipvlan.sh +++ b/test/suites/container_devices_nic_ipvlan.sh @@ -16,6 +16,7 @@ test_container_devices_nic_ipvlan() { # Check that starting IPVLAN container. sysctl net.ipv6.conf."${ct_name}".proxy_ndp=1 sysctl net.ipv6.conf."${ct_name}".forwarding=1 + sysctl net.ipv4.conf."${ct_name}".forwarding=1 lxc init testimage "${ct_name}" lxc config device add "${ct_name}" eth0 nic \ nictype=ipvlan \ From bdc6452b5428066c8cb26fd14f40e874084c589f Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Fri, 10 May 2019 08:48:22 +0100 Subject: [PATCH 2/2] container/lxc: Moves IPVLAN init code into own function Signed-off-by: Thomas Parrott --- lxd/container_lxc.go | 112 +++++++++++++++++++++++-------------------- 1 file changed, 61 insertions(+), 51 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 24ae77f413..4955294dbf 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -1672,60 +1672,10 @@ func (c *containerLXC) initLXC(config bool) error { return err } } else if m["nictype"] == "ipvlan" { - err = c.checkIPVLANSupport() + err = c.initLXCIPVLAN(cc, networkKeyPrefix, networkidx, m) if err != nil { return err } - - err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.type", networkKeyPrefix, networkidx), "ipvlan") - if err != nil { - return err - } - - err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipvlan.mode", networkKeyPrefix, networkidx), "l3s") - if err != nil { - return err - } - - err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipvlan.isolation", networkKeyPrefix, networkidx), "bridge") - if err != nil { - return err - } - - err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.l2proxy", networkKeyPrefix, networkidx), "1") - if err != nil { - return err - } - - if m["ipv4.address"] != "" { - for _, addr := range strings.Split(m["ipv4.address"], ",") { - addr = strings.TrimSpace(addr) - err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipv4.address", networkKeyPrefix, networkidx), fmt.Sprintf("%s/32", addr)) - if err != nil { - return err - } - } - - err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipv4.gateway", networkKeyPrefix, networkidx), "dev") - if err != nil { - return err - } - } - - if m["ipv6.address"] != "" { - for _, addr := range strings.Split(m["ipv6.address"], ",") { - addr = strings.TrimSpace(addr) - err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipv6.address", networkKeyPrefix, networkidx), fmt.Sprintf("%s/128", addr)) - if err != nil { - return err - } - } - - err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipv6.gateway", networkKeyPrefix, networkidx), "dev") - if err != nil { - return err - } - } } // Check if the container has network specific keys set to avoid unnecessarily running the network up hook. @@ -1943,6 +1893,66 @@ func (c *containerLXC) initLXC(config bool) error { return nil } +// initLXCIPVLAN runs as part of initLXC function and initialises liblxc with the IPVLAN config. +func (c *containerLXC) initLXCIPVLAN(cc *lxc.Container, networkKeyPrefix string, networkidx int, m map[string]string) error { + err := c.checkIPVLANSupport() + if err != nil { + return err + } + + err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.type", networkKeyPrefix, networkidx), "ipvlan") + if err != nil { + return err + } + + err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipvlan.mode", networkKeyPrefix, networkidx), "l3s") + if err != nil { + return err + } + + err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipvlan.isolation", networkKeyPrefix, networkidx), "bridge") + if err != nil { + return err + } + + err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.l2proxy", networkKeyPrefix, networkidx), "1") + if err != nil { + return err + } + + if m["ipv4.address"] != "" { + for _, addr := range strings.Split(m["ipv4.address"], ",") { + addr = strings.TrimSpace(addr) + err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipv4.address", networkKeyPrefix, networkidx), fmt.Sprintf("%s/32", addr)) + if err != nil { + return err + } + } + + err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipv4.gateway", networkKeyPrefix, networkidx), "dev") + if err != nil { + return err + } + } + + if m["ipv6.address"] != "" { + for _, addr := range strings.Split(m["ipv6.address"], ",") { + addr = strings.TrimSpace(addr) + err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipv6.address", networkKeyPrefix, networkidx), fmt.Sprintf("%s/128", addr)) + if err != nil { + return err + } + } + + err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.ipv6.gateway", networkKeyPrefix, networkidx), "dev") + if err != nil { + return err + } + } + + return nil +} + // Initialize storage interface for this container func (c *containerLXC) initStorage() error { if c.storage != nil { From lxc-bot at linuxcontainers.org Fri May 10 09:46:10 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Fri, 10 May 2019 02:46:10 -0700 (PDT) Subject: [lxc-devel] [lxd/master] container/lxc: Fixes ipvlan support check Message-ID: <5cd54862.1c69fb81.45048.a453SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 433 bytes Desc: not available URL: -------------- next part -------------- From 8dd19e86f13606d9cf5c5938155ee747c7f455d4 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Fri, 10 May 2019 10:44:55 +0100 Subject: [PATCH] container/lxc: Fixes ipvlan support check Also alerts user if they try to add ipvlan nic to running container. Signed-off-by: Thomas Parrott --- lxd/container_lxc.go | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 4955294dbf..466b5c17f2 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -8171,13 +8171,13 @@ func (c *containerLXC) insertNetworkDevice(name string, m types.Device) (types.D return nil, err } + // Alert user if trying to add an ipvlan nic on running container. + if m["nictype"] == "ipvlan" { + return nil, errors.New("Can't insert ipvlan device to running container") + } + // Setup network device if not type ipvlan (which is done via liblxc only) if m["nictype"] != "ipvlan" { - err := c.checkIPVLANSupport() - if err != nil { - return nil, err - } - // Create the interface devName, err := c.createNetworkDevice(name, m) if err != nil { From lxc-bot at linuxcontainers.org Fri May 10 10:19:10 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Fri, 10 May 2019 03:19:10 -0700 (PDT) Subject: [lxc-devel] [lxd/master] Fixes MTU tests for VLAN support in latest LXC Message-ID: <5cd5501e.1c69fb81.456e6.c520SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From 540ff2956013083f947a832c81a8bed860b3fc54 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Thu, 9 May 2019 17:51:36 +0100 Subject: [PATCH 1/2] test: Updates physical tests to detect MTU support in LXC Signed-off-by: Thomas Parrott --- test/suites/container_devices_nic_physical.sh | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/test/suites/container_devices_nic_physical.sh b/test/suites/container_devices_nic_physical.sh index eae925645c..8a5ce2cd3e 100644 --- a/test/suites/container_devices_nic_physical.sh +++ b/test/suites/container_devices_nic_physical.sh @@ -21,10 +21,12 @@ test_container_devices_nic_physical() { # Lauch container and check it has nic applied correctly. lxc start "${ct_name}" - # Check custom MTU is applied. - if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1400" ; then - echo "mtu invalid" - true #We expect this to not apply currently as LXC won't apply it to physical devices. + # Check custom MTU is applied if feature available in LXD. + if lxc info | grep 'network_phys_macvlan_mtu: "true"' ; then + if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1400" ; then + echo "mtu invalid" + false + fi fi lxc config device remove "${ct_name}" eth0 @@ -36,10 +38,10 @@ test_container_devices_nic_physical() { parent="${ct_name}" \ name=eth0 \ vlan=10 \ - mtu=1401 + mtu=1399 #This must be less than or equal to the MTU of the parent device (which is not being reset) # Check custom MTU is applied. - if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1401" ; then + if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1399" ; then echo "mtu invalid" false fi From bb3cf4e3932f137693eff46dbb0650bb8fd0c3e9 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Thu, 9 May 2019 17:50:48 +0100 Subject: [PATCH 2/2] test: Updates macvlan tests to detect MTU support in LXC Signed-off-by: Thomas Parrott --- test/suites/container_devices_nic_macvlan.sh | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/test/suites/container_devices_nic_macvlan.sh b/test/suites/container_devices_nic_macvlan.sh index 934cd6345d..e436eeb4c0 100644 --- a/test/suites/container_devices_nic_macvlan.sh +++ b/test/suites/container_devices_nic_macvlan.sh @@ -19,10 +19,12 @@ test_container_devices_nic_macvlan() { lxc exec "${ct_name}" -- ip addr add "192.0.2.1${ipRand}/24" dev eth0 lxc exec "${ct_name}" -- ip addr add "2001:db8::1${ipRand}/64" dev eth0 - # Check custom MTU is applied on boot. - if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1400" ; then - echo "mtu invalid" - true #We expect this to not apply currently as LXC won't apply it to macvlan devices. + # Check custom MTU is applied if feature available in LXD. + if lxc info | grep 'network_phys_macvlan_mtu: "true"' ; then + if ! lxc exec "${ct_name}" -- ip link show eth0 | grep "mtu 1400" ; then + echo "mtu invalid" + false + fi fi #Spin up another container with multiple IPs. From lxc-bot at linuxcontainers.org Fri May 10 11:16:36 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Fri, 10 May 2019 04:16:36 -0700 (PDT) Subject: [lxc-devel] [lxc/master] coding style: update Message-ID: <5cd55d94.1c69fb81.aa878.e665SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 364 bytes Desc: not available URL: -------------- next part -------------- From 24418a9d45ceb626d42b71106c1c9ce4602500c1 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 10 May 2019 13:15:25 +0200 Subject: [PATCH] coding style: update Signed-off-by: Christian Brauner --- CODING_STYLE.md | 69 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 53 insertions(+), 16 deletions(-) diff --git a/CODING_STYLE.md b/CODING_STYLE.md index fb5022c49c..0b0ec08bb0 100644 --- a/CODING_STYLE.md +++ b/CODING_STYLE.md @@ -1,8 +1,8 @@ LXC Coding Style Guide ====================== -In general the LXC project follows the Linux kernel coding style. There are -however are a few differences, these are outlined in this document. +In general the LXC project follows the Linux kernel coding style. Hwoever, +there are a few differences. They are outlined in this document. The Linux kernel coding style guide can be found within the kernel tree: @@ -83,15 +83,17 @@ https://www.kernel.org/doc/html/latest/process/coding-style.html ## 3) Only use `/* */` Style Comments - Any comments that are added must use `/* */`. -- All comments should start on the same line as the opening `/*`. +- Single-line comments should start on the same line as the opening `/*`. - Single-line comments should simply be placed between `/* */`. For example: ```C /* Define pivot_root() if missing from the C library */ ``` -- Multi-line comments should end with the closing `*/` on a separate line. For +- Mutli-line comment should start on the next line following the opening + `/*`and should end with the closing `*/` on a separate line. For example: ```C - /* At this point the old-root is mounted on top of our new-root + /* + * At this point the old-root is mounted on top of our new-root * To unmounted it we must not be chdir()ed into it, so escape back * to old-root. */ @@ -109,16 +111,49 @@ https://www.kernel.org/doc/html/latest/process/coding-style.html punctuation sign. - They should be descriptive, without being needlessly long. It is best to just use already existing error messages as examples. +- The commit message itself is not subject to rule 4), i.e. it should not be + wrapped at 80chars. This is to make it easy to grep for it. - Examples of acceptable error messages are: ```C SYSERROR("Failed to create directory \"%s\"", path); WARN("\"/dev\" directory does not exist. Proceeding without autodev being set up"); ``` -## 6) Return Error Codes +## 6) Set `errno` -- When writing a function that can fail in a non-binary way try to return - meaningful negative error codes (e.g. `return -EINVAL;`). +- Functions that can fail in a non-binary way should return `-1` and set + `errno` to a meaningful error code. + As a convenience LXC provides the `minus_one_set_errno` macro: + ```C + static int set_config_net_l2proxy(const char *key, const char *value, + struct lxc_conf *lxc_conf, void *data) + { + struct lxc_netdev *netdev = data; + unsigned int val = 0; + int ret; + + if (lxc_config_value_empty(value)) + return clr_config_net_l2proxy(key, lxc_conf, data); + + if (!netdev) + return minus_one_set_errno(EINVAL); + + ret = lxc_safe_uint(value, &val); + if (ret < 0) + return minus_one_set_errno(-ret); + + switch (val) { + case 0: + netdev->l2proxy = false; + return 0; + case 1: + netdev->l2proxy = true; + return 0; + } + + return minus_one_set_errno(EINVAL); + } + ``` ## 7) All Unexported Functions Must Be Declared `static` @@ -133,15 +168,17 @@ https://www.kernel.org/doc/html/latest/process/coding-style.html ## 9) Declaring Variables - variables should be declared at the top of the function or at the beginning - of a new scope but **never** in the middle of a scope -1. uninitialized variables - - put base types before complex types - - put standard types defined by libc before types defined by LXC - - put multiple declarations of the same type on the same line + of a new scope but **never** in the middle of a scope. They should be ordered + in the following way: +1. automatically freed variables + - This specifically references variables cleaned up via the `cleanup` + attribute as supported by `gcc` and `clang`. 2. initialized variables - - put base types before complex types - - put standard types defined by libc before types defined by LXC - - put multiple declarations of the same type on the same line +3. uninitialized variables +General rules are: +- put base types before complex types +- put standard types defined by libc before types defined by LXC +- put multiple declarations of the same type on the same line - Examples of good declarations can be seen in the following function: ```C int lxc_clear_procs(struct lxc_conf *c, const char *key) From noreply at github.com Fri May 10 12:36:58 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Fri, 10 May 2019 05:36:58 -0700 Subject: [lxc-devel] [lxc/lxc] a8e63d: coding style: update Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: a8e63d6904197f2737f477bff116013800b8a05c https://github.com/lxc/lxc/commit/a8e63d6904197f2737f477bff116013800b8a05c Author: Christian Brauner Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M CODING_STYLE.md Log Message: ----------- coding style: update Signed-off-by: Christian Brauner Commit: 792ea40042390afe2cfb098875942d2e1bb969c5 https://github.com/lxc/lxc/commit/792ea40042390afe2cfb098875942d2e1bb969c5 Author: Stéphane Graber Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M CODING_STYLE.md Log Message: ----------- Merge pull request #2992 from brauner/2019-05-10/coding_style_update coding style: update Compare: https://github.com/lxc/lxc/compare/9e195036412a...792ea4004239 From builds at travis-ci.org Fri May 10 12:39:19 2019 From: builds at travis-ci.org (Travis CI) Date: Fri, 10 May 2019 12:39:19 +0000 Subject: [lxc-devel] Errored: lxc/lxc#6806 (master - 792ea40) In-Reply-To: Message-ID: <5cd570f718415_43ff3b4c97bb497688@da8a8b29-a0db-4d6c-ad85-197ee78c3c59.mail> Build Update for lxc/lxc ------------------------------------- Build: #6806 Status: Errored Duration: 1 min and 53 secs Commit: 792ea40 (master) Author: Stéphane Graber Message: Merge pull request #2992 from brauner/2019-05-10/coding_style_update coding style: update View the changeset: https://github.com/lxc/lxc/compare/9e195036412a...792ea4004239 View the full build log and details: https://travis-ci.org/lxc/lxc/builds/530729746?utm_medium=notification&utm_source=email -- You can unsubscribe from build emails from the lxc/lxc repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=1693277&utm_medium=notification&utm_source=email. Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email. Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxc-bot at linuxcontainers.org Fri May 10 15:01:26 2019 From: lxc-bot at linuxcontainers.org (Rachid-Koucha on Github) Date: Fri, 10 May 2019 08:01:26 -0700 (PDT) Subject: [lxc-devel] [lxc/master] New --bbpath option and unecessary --rootfs checks Message-ID: <5cd59246.1c69fb81.b939.c737SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 657 bytes Desc: not available URL: -------------- next part -------------- From e7962394064746793403143de177f09220eb9419 Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Fri, 10 May 2019 17:01:13 +0200 Subject: [PATCH] New --bbpath option and unecessary --rootfs checks . Add the "--bbpath" option to pass an alternate busybox pathname instead of the one found from ${PATH}. . Take this opportunity to add some formatting in the usage display . As a try is done to pick rootfs from the config file and set it to ${path}/rootfs, it is unnecessary to make it mandatory Signed-off-by: Rachid Koucha --- templates/lxc-busybox.in | 41 +++++++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 15 deletions(-) diff --git a/templates/lxc-busybox.in b/templates/lxc-busybox.in index 3601655036..22cf27835d 100644 --- a/templates/lxc-busybox.in +++ b/templates/lxc-busybox.in @@ -23,7 +23,7 @@ LXC_MAPPED_UID= LXC_MAPPED_GID= -BUSYBOX_EXE= +BUSYBOX_EXE=`which busybox` # Make sure the usual locations are in PATH export PATH=$PATH:/usr/sbin:/usr/bin:/sbin:/bin @@ -266,19 +266,26 @@ usage() { LXC busybox image builder Special arguments: -[ -h | --help ]: Print this help message and exit. - -LXC internal arguments (do not pass manually!): -[ --name ]: The container name -[ --path ]: The path to the container -[ --rootfs ]: The path to the container's rootfs -[ --mapped-uid ]: A uid map (user namespaces) -[ --mapped-gid ]: A gid map (user namespaces) + + [ -h | --help ]: Print this help message and exit. + +LXC internal arguments: + + [ --name ]: The container name + [ --path ]: The path to the container + [ --rootfs ]: The path to the container's rootfs (default: config or /rootfs) + [ --mapped-uid ]: A uid map (user namespaces) + [ --mapped-gid ]: A gid map (user namespaces) + +BUSYBOX template specific arguments: + + [ --bbpath ]: busybox pathname (default: ${BUSYBOX_EXE}) + EOF return 0 } -if ! options=$(getopt -o hp:n: -l help,rootfs:,path:,name:,mapped-uid:,mapped-gid: -- "$@"); then +if ! options=$(getopt -o hp:n: -l help,rootfs:,path:,name:,mapped-uid:,mapped-gid:,bbpath: -- "$@"); then usage exit 1 fi @@ -293,21 +300,25 @@ do --rootfs) rootfs=$2; shift 2;; --mapped-uid) LXC_MAPPED_UID=$2; shift 2;; --mapped-gid) LXC_MAPPED_GID=$2; shift 2;; + --bbpath) BUSYBOX_EXE=$2; shift 2;; --) shift 1; break ;; *) break ;; esac done # Check that we have all variables we need -if [ -z "${name}" ] || [ -z "${path}" ] || [ -z "${rootfs}" ]; then - echo "ERROR: Please pass the name, path, and rootfs for the container" 1>&2 +if [ -z "${name}" ] || [ -z "${path}" ]; then + echo "ERROR: Please pass the name and path for the container" 1>&2 exit 1 fi # Make sure busybox is present -BUSYBOX_EXE=`which busybox` -if [ $? -ne 0 ]; then - echo "ERROR: Failed to find busybox binary" 1>&2 +if [ -z "${BUSYBOX_EXE}" ]; then + echo "ERROR: Please pass a pathname for busybox binary" 1>&2 + exit 1 +fi +if [ ! -x "${BUSYBOX_EXE}" ]; then + echo "ERROR: Failed to find busybox binary (${BUSYBOX_EXE})" 1>&2 exit 1 fi From lxc-bot at linuxcontainers.org Fri May 10 16:47:54 2019 From: lxc-bot at linuxcontainers.org (Rachid-Koucha on Github) Date: Fri, 10 May 2019 09:47:54 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Do not display info if unprivileged (lxc-ls...) Message-ID: <5cd5ab3a.1c69fb81.87bf2.3cd5SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 1336 bytes Desc: not available URL: -------------- next part -------------- From 24c22aee8d1e262fa77d7b0e45b6dc221885f4f7 Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Fri, 10 May 2019 18:47:22 +0200 Subject: [PATCH] Do not display info if unprivileged (lxc-ls...) While doing a lxc-ls without root privileges on privileged containers, some information are displayed. In lxc_container_new(), ongoing_create()'s result is not checked for all possible returned values. Hence, an unprivileged user can send command messages to the container's monitor. For example: $ lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr - 0 - - - false $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.51 - false In lxccontainer.c/lxc_container_new(), add more checks concerning the returned code of ongoing_create() in order to catch privilege problems. After this change: $ lxc-ls -P /.../tests -f <-------- No more display without root privileges $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.37 - false $ --- src/lxc/lxccontainer.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c index 98f86a24e4..c653b3e8cd 100644 --- a/src/lxc/lxccontainer.c +++ b/src/lxc/lxccontainer.c @@ -5249,6 +5249,7 @@ struct lxc_container *lxc_container_new(const char *name, const char *configpath { struct lxc_container *c; size_t len; + int rc; if (!name) return NULL; @@ -5302,10 +5303,26 @@ struct lxc_container *lxc_container_new(const char *name, const char *configpath goto err; } - if (ongoing_create(c) == 2) { + rc = ongoing_create(c); + switch(rc) { + // Uncompleted container creation + case 2: ERROR("Failed to complete container creation for %s", c->name); container_destroy(c, NULL); lxcapi_clear_config(c); + break; + // Container creation is on tracks + case 1: + goto err; + break; + // Error + case -1: + // No display if privilege problem + if (EACCES != errno && EPERM != errno) { + ERROR("Failed checking for incomplete container %s creation", c->name); + } + goto err; + break; } c->daemonize = true; From lxc-bot at linuxcontainers.org Fri May 10 16:55:42 2019 From: lxc-bot at linuxcontainers.org (geaaru on Github) Date: Fri, 10 May 2019 09:55:42 -0700 (PDT) Subject: [lxc-devel] [distrobuilder/master] sabayon: Update examples Message-ID: <5cd5ad0e.1c69fb81.c2b2c.d8ddSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 522 bytes Desc: not available URL: -------------- next part -------------- From 60caf978d47b96954f4bfe637481d4b923fa456e Mon Sep 17 00:00:00 2001 From: Daniele Rondina Date: Fri, 10 May 2019 18:54:25 +0200 Subject: [PATCH] sabayon: Update examples * Enable systemd-networkd service * Added ETP_NONINTERACTIVE for avoid blocking installation processes * Disable systemd-journald-audit service Signed-off-by: Daniele Rondina --- doc/examples/sabayon | 32 ++++++++++++++++++++++++++++++++ doc/examples/sabayon-docker | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) diff --git a/doc/examples/sabayon b/doc/examples/sabayon index 22d557d..372530a 100644 --- a/doc/examples/sabayon +++ b/doc/examples/sabayon @@ -15,6 +15,8 @@ environment: value: "/bin/bash" - key: "ACCEPT_LICENSE" value: "*" + - key: "ETP_NONINTERACTIVE" + value: "1" targets: lxc: @@ -102,6 +104,13 @@ actions: cd /etc/systemd/system ln -s /dev/null dev-hugepages.mount + # Disable systemd-journald-audit service + - trigger: post-packages + action: |- + #!/bin/bash + cd /etc/systemd/system + ln -s /dev/null systemd-journald-audit.socket + # Disable sabayon-anti-fork-bomb limits # (already apply to host) - trigger: post-packages @@ -110,6 +119,29 @@ actions: sed -i -e 's/^*/#*/g' /etc/security/limits.d/00-sabayon-anti-fork-bomb.conf sed -i -e 's/^root/#root/g' /etc/security/limits.d/00-sabayon-anti-fork-bomb.conf + # Configure DHCP for interface eth0 by default. + # Avoid to use DHCP for any interface to avoid reset of docker + # interfaces or others custom interfaces. + - trigger: post-packages + action: |- + #!/bin/bash + cat > /etc/systemd/network/default_dhcp.network << "EOF" + [Network] + DHCP=ipv4 + + [Match] + Name=eth0 + + [DHCP] + UseDomains=true + EOF + + # Enable systemd-networkd service by default. + - trigger: post-packages + action: |- + #!/bin/bash + systemctl enable systemd-networkd + # Clean journal directory (to avoid permission errors) - trigger: post-packages action: |- diff --git a/doc/examples/sabayon-docker b/doc/examples/sabayon-docker index ae132ac..23cfda1 100644 --- a/doc/examples/sabayon-docker +++ b/doc/examples/sabayon-docker @@ -16,6 +16,8 @@ environment: value: "/bin/bash" - key: "ACCEPT_LICENSE" value: "*" + - key: "ETP_NONINTERACTIVE" + value: "1" targets: lxc: @@ -103,6 +105,13 @@ actions: cd /etc/systemd/system ln -s /dev/null dev-hugepages.mount + # Disable systemd-journald-audit service + - trigger: post-packages + action: |- + #!/bin/bash + cd /etc/systemd/system + ln -s /dev/null systemd-journald-audit.socket + # Disable sabayon-anti-fork-bomb limits # (already apply to host) - trigger: post-packages @@ -111,6 +120,29 @@ actions: sed -i -e 's/^*/#*/g' /etc/security/limits.d/00-sabayon-anti-fork-bomb.conf sed -i -e 's/^root/#root/g' /etc/security/limits.d/00-sabayon-anti-fork-bomb.conf + # Configure DHCP for interface eth0 by default. + # Avoid to use DHCP for any interface to avoid reset of docker + # interfaces or others custom interfaces. + - trigger: post-packages + action: |- + #!/bin/bash + cat > /etc/systemd/network/default_dhcp.network << "EOF" + [Network] + DHCP=ipv4 + + [Match] + Name=eth0 + + [DHCP] + UseDomains=true + EOF + + # Enable systemd-networkd service by default. + - trigger: post-packages + action: |- + #!/bin/bash + systemctl enable systemd-networkd + # Clean journal directory (to avoid permission errors) - trigger: post-packages action: |- From lxc-bot at linuxcontainers.org Fri May 10 16:56:23 2019 From: lxc-bot at linuxcontainers.org (Rachid-Koucha on Github) Date: Fri, 10 May 2019 09:56:23 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Do not display info without privileges (lxc-ls...) Message-ID: <5cd5ad37.1c69fb81.97ae.6648SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 1336 bytes Desc: not available URL: -------------- next part -------------- From 6457a0940a9a75c3845c7c5a0a632cd8d7728048 Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Fri, 10 May 2019 18:56:12 +0200 Subject: [PATCH] Do not display info without privileges (lxc-ls...) lxc-ls without root privileges on privileged containers should not display information. In lxc_container_new(), ongoing_create()'s result is not checked for all possible returned values. Hence, an unprivileged user can send command messages to the container's monitor. For example: $ lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr - 0 - - - false $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.51 - false After this change: $ lxc-ls -P /.../tests -f <-------- No more display without root privileges $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.37 - false $ Signed-off-by: Rachid Koucha --- src/lxc/lxccontainer.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c index 98f86a24e4..c653b3e8cd 100644 --- a/src/lxc/lxccontainer.c +++ b/src/lxc/lxccontainer.c @@ -5249,6 +5249,7 @@ struct lxc_container *lxc_container_new(const char *name, const char *configpath { struct lxc_container *c; size_t len; + int rc; if (!name) return NULL; @@ -5302,10 +5303,26 @@ struct lxc_container *lxc_container_new(const char *name, const char *configpath goto err; } - if (ongoing_create(c) == 2) { + rc = ongoing_create(c); + switch(rc) { + // Uncompleted container creation + case 2: ERROR("Failed to complete container creation for %s", c->name); container_destroy(c, NULL); lxcapi_clear_config(c); + break; + // Container creation is on tracks + case 1: + goto err; + break; + // Error + case -1: + // No display if privilege problem + if (EACCES != errno && EPERM != errno) { + ERROR("Failed checking for incomplete container %s creation", c->name); + } + goto err; + break; } c->daemonize = true; From lxc-bot at linuxcontainers.org Fri May 10 19:05:07 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Fri, 10 May 2019 12:05:07 -0700 (PDT) Subject: [lxc-devel] [lxc/master] lxccontainer: do not display if missing privileges Message-ID: <5cd5cb63.1c69fb81.dfb28.92adSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 1238 bytes Desc: not available URL: -------------- next part -------------- From 9fbe07f68da62c90ff849eb1e2d59396d2a9672f Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Fri, 10 May 2019 18:56:12 +0200 Subject: [PATCH] lxccontainer: do not display if missing privileges lxc-ls without root privileges on privileged containers should not display information. In lxc_container_new(), ongoing_create()'s result is not checked for all possible returned values. Hence, an unprivileged user can send command messages to the container's monitor. For example: $ lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr - 0 - - - false $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.51 - false After this change: $ lxc-ls -P /.../tests -f <-------- No more display without root privileges $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.37 - false $ Signed-off-by: Rachid Koucha Signed-off-by: Christian Brauner --- src/lxc/lxccontainer.c | 56 ++++++++++++++++++++++++++++++------------ 1 file changed, 40 insertions(+), 16 deletions(-) diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c index 98f86a24e4..cea8aa5d7b 100644 --- a/src/lxc/lxccontainer.c +++ b/src/lxc/lxccontainer.c @@ -135,7 +135,8 @@ static bool config_file_exists(const char *lxcpath, const char *cname) return file_exists(fname); } -/* A few functions to help detect when a container creation failed. If a +/* + * A few functions to help detect when a container creation failed. If a * container creation was killed partway through, then trying to actually start * that container could harm the host. We detect this by creating a 'partial' * file under the container directory, and keeping an advisory lock. When @@ -143,30 +144,39 @@ static bool config_file_exists(const char *lxcpath, const char *cname) * start a container, if we find that file, without a flock, we remove the * container. */ +enum { + LXC_CREATE_FAILED = -1, + LXC_CREATE_SUCCESS = 0, + LXC_CREATE_ONGOING = 1, + LXC_CREATE_INCOMPLETE = 2, +}; + static int ongoing_create(struct lxc_container *c) { + __do_close_prot_errno int fd = -EBADF; __do_free char *path = NULL; - int fd, ret; - size_t len; struct flock lk = {0}; + int ret; + size_t len; len = strlen(c->config_path) + strlen(c->name) + 10; path = must_realloc(NULL, len); ret = snprintf(path, len, "%s/%s/partial", c->config_path, c->name); if (ret < 0 || (size_t)ret >= len) - return -1; + return LXC_CREATE_FAILED; fd = open(path, O_RDWR | O_CLOEXEC); if (fd < 0) { if (errno != ENOENT) - return -1; + return LXC_CREATE_FAILED; - return 0; + return LXC_CREATE_SUCCESS; } lk.l_type = F_WRLCK; lk.l_whence = SEEK_SET; - /* F_OFD_GETLK requires that l_pid be set to 0 otherwise the kernel + /* + * F_OFD_GETLK requires that l_pid be set to 0 otherwise the kernel * will EINVAL us. */ lk.l_pid = 0; @@ -178,15 +188,13 @@ static int ongoing_create(struct lxc_container *c) ret = 0; } - close(fd); - /* F_OFD_GETLK will not send us back a pid so don't check it. */ if (ret == 0) /* Create is still ongoing. */ - return 1; + return LXC_CREATE_ONGOING; /* Create completed but partial is still there. */ - return 2; + return LXC_CREATE_INCOMPLETE; } static int create_partial(struct lxc_container *c) @@ -891,13 +899,14 @@ static bool do_lxcapi_start(struct lxc_container *c, int useinit, char * const a return false; ret = ongoing_create(c); - if (ret < 0) { + switch (ret) { + case LXC_CREATE_FAILED: ERROR("Failed checking for incomplete container creation"); return false; - } else if (ret == 1) { + case LXC_CREATE_ONGOING: ERROR("Ongoing container creation detected"); return false; - } else if (ret == 2) { + case LXC_CREATE_INCOMPLETE: ERROR("Failed to create container"); do_lxcapi_destroy(c); return false; @@ -5249,6 +5258,7 @@ struct lxc_container *lxc_container_new(const char *name, const char *configpath { struct lxc_container *c; size_t len; + int rc; if (!name) return NULL; @@ -5302,10 +5312,24 @@ struct lxc_container *lxc_container_new(const char *name, const char *configpath goto err; } - if (ongoing_create(c) == 2) { - ERROR("Failed to complete container creation for %s", c->name); + rc = ongoing_create(c); + switch (rc) { + case LXC_CREATE_INCOMPLETE: + SYSERROR("Failed to complete container creation for %s", c->name); container_destroy(c, NULL); lxcapi_clear_config(c); + break; + case LXC_CREATE_ONGOING: + /* container creation going on */ + break; + case LXC_CREATE_FAILED: + /* container creation failed */ + if (errno != EACCES && errno != EPERM) { + /* insufficient privileges */ + SYSERROR("Failed checking for incomplete container %s creation", c->name); + goto err; + } + break; } c->daemonize = true; From noreply at github.com Fri May 10 19:20:22 2019 From: noreply at github.com (Christian Brauner) Date: Fri, 10 May 2019 12:20:22 -0700 Subject: [lxc-devel] [lxc/lxc] 9fbe07: lxccontainer: do not display if missing privileges Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 9fbe07f68da62c90ff849eb1e2d59396d2a9672f https://github.com/lxc/lxc/commit/9fbe07f68da62c90ff849eb1e2d59396d2a9672f Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M src/lxc/lxccontainer.c Log Message: ----------- lxccontainer: do not display if missing privileges lxc-ls without root privileges on privileged containers should not display information. In lxc_container_new(), ongoing_create()'s result is not checked for all possible returned values. Hence, an unprivileged user can send command messages to the container's monitor. For example: $ lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr - 0 - - - false $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.51 - false After this change: $ lxc-ls -P /.../tests -f <-------- No more display without root privileges $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.37 - false $ Signed-off-by: Rachid Koucha Signed-off-by: Christian Brauner Commit: e269d99b026cc400a8b7137c3427d6985b85ae91 https://github.com/lxc/lxc/commit/e269d99b026cc400a8b7137c3427d6985b85ae91 Author: Christian Brauner Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M src/lxc/lxccontainer.c Log Message: ----------- Merge pull request #2996 from brauner/Rachid-Koucha-patch-10 lxccontainer: do not display if missing privileges Compare: https://github.com/lxc/lxc/compare/792ea4004239...e269d99b026c From builds at travis-ci.org Fri May 10 19:22:58 2019 From: builds at travis-ci.org (Travis CI) Date: Fri, 10 May 2019 19:22:58 +0000 Subject: [lxc-devel] Passed: lxc/lxc#6812 (master - e269d99) In-Reply-To: Message-ID: <5cd5cf92301c2_43fb78ef34cf8218877@a16a5c14-aa42-409d-bccf-4701d985b8c5.mail> Build Update for lxc/lxc ------------------------------------- Build: #6812 Status: Passed Duration: 2 mins and 8 secs Commit: e269d99 (master) Author: Christian Brauner Message: Merge pull request #2996 from brauner/Rachid-Koucha-patch-10 lxccontainer: do not display if missing privileges View the changeset: https://github.com/lxc/lxc/compare/792ea4004239...e269d99b026c View the full build log and details: https://travis-ci.org/lxc/lxc/builds/530897446?utm_medium=notification&utm_source=email -- You can unsubscribe from build emails from the lxc/lxc repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=1693277&utm_medium=notification&utm_source=email. Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email. Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at github.com Fri May 10 19:35:57 2019 From: noreply at github.com (Christian Brauner) Date: Fri, 10 May 2019 12:35:57 -0700 Subject: [lxc-devel] [lxc/lxc] e79623: New --bbpath option and unecessary --rootfs checks Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: e7962394064746793403143de177f09220eb9419 https://github.com/lxc/lxc/commit/e7962394064746793403143de177f09220eb9419 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- New --bbpath option and unecessary --rootfs checks . Add the "--bbpath" option to pass an alternate busybox pathname instead of the one found from ${PATH}. . Take this opportunity to add some formatting in the usage display . As a try is done to pick rootfs from the config file and set it to ${path}/rootfs, it is unnecessary to make it mandatory Signed-off-by: Rachid Koucha Commit: 5f0fb855f83eb996355b96f25dd3fea0589009c1 https://github.com/lxc/lxc/commit/5f0fb855f83eb996355b96f25dd3fea0589009c1 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Option --busybox-path instead of --bbpath As suggested during the review. Signed-off-by: Rachid Koucha Commit: da161bc1a2ef2ab89fb2e5c22ff8fb64b3e6c28f https://github.com/lxc/lxc/commit/da161bc1a2ef2ab89fb2e5c22ff8fb64b3e6c28f Author: Christian Brauner Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Merge pull request #2993 from Rachid-Koucha/patch-9 New --bbpath option and unecessary --rootfs checks Compare: https://github.com/lxc/lxc/compare/e269d99b026c...da161bc1a2ef From builds at travis-ci.org Fri May 10 21:30:48 2019 From: builds at travis-ci.org (Travis CI) Date: Fri, 10 May 2019 21:30:48 +0000 Subject: [lxc-devel] Passed: rst0git/lxc#3 (criu-v-option - 29bcd47) In-Reply-To: Message-ID: <5cd5ed87c558d_43f9eb88b797848466@051cb6cb-50dc-4139-9d45-084a153c5136.mail> Build Update for rst0git/lxc ------------------------------------- Build: #3 Status: Passed Duration: 2 mins and 7 secs Commit: 29bcd47 (criu-v-option) Author: Radostin Stoyanov Message: criu: Use -v4 instead of -vvvvv Signed-off-by: Radostin Stoyanov View the changeset: https://github.com/rst0git/lxc/compare/c14ea11dccbf^...29bcd47bf413 View the full build log and details: https://travis-ci.org/rst0git/lxc/builds/530944669?utm_medium=notification&utm_source=email -- You can unsubscribe from build emails from the rst0git/lxc repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=22079066&utm_medium=notification&utm_source=email. Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email. Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxc-bot at linuxcontainers.org Fri May 10 21:43:38 2019 From: lxc-bot at linuxcontainers.org (rst0git on Github) Date: Fri, 10 May 2019 14:43:38 -0700 (PDT) Subject: [lxc-devel] [lxc/master] criu: Use -v4 instead of -vvvvvv Message-ID: <5cd5f08a.1c69fb81.79190.07e5SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 436 bytes Desc: not available URL: -------------- next part -------------- From 582cb4785a827553d10a6b82185763feb353a114 Mon Sep 17 00:00:00 2001 From: Radostin Stoyanov Date: Fri, 10 May 2019 22:25:54 +0100 Subject: [PATCH] criu: Use -v4 instead of -vvvvvv CRIU has only 4 levels of verbosity (errors, warnings, info, debug). Thus, using `-v4` is more appropriate. https://criu.org/Logging Signed-off-by: Radostin Stoyanov --- src/lxc/criu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lxc/criu.c b/src/lxc/criu.c index 7fd4d338a8..1b74cf8137 100644 --- a/src/lxc/criu.c +++ b/src/lxc/criu.c @@ -375,7 +375,7 @@ static void exec_criu(struct cgroup_ops *cgroup_ops, struct lxc_conf *conf, } if (opts->user->verbose) - DECLARE_ARG("-vvvvvv"); + DECLARE_ARG("-v4"); if (opts->user->action_script) { DECLARE_ARG("--action-script"); From noreply at github.com Fri May 10 21:47:29 2019 From: noreply at github.com (Christian Brauner) Date: Fri, 10 May 2019 14:47:29 -0700 Subject: [lxc-devel] [lxc/lxc] 582cb4: criu: Use -v4 instead of -vvvvvv Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 582cb4785a827553d10a6b82185763feb353a114 https://github.com/lxc/lxc/commit/582cb4785a827553d10a6b82185763feb353a114 Author: Radostin Stoyanov Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M src/lxc/criu.c Log Message: ----------- criu: Use -v4 instead of -vvvvvv CRIU has only 4 levels of verbosity (errors, warnings, info, debug). Thus, using `-v4` is more appropriate. https://criu.org/Logging Signed-off-by: Radostin Stoyanov Commit: ad4dddd85e9fb1fbb38b411d6041eb0e7bf3f175 https://github.com/lxc/lxc/commit/ad4dddd85e9fb1fbb38b411d6041eb0e7bf3f175 Author: Christian Brauner Date: 2019-05-10 (Fri, 10 May 2019) Changed paths: M src/lxc/criu.c Log Message: ----------- Merge pull request #2997 from rst0git/criu-v-option criu: Use -v4 instead of -vvvvvv Compare: https://github.com/lxc/lxc/compare/da161bc1a2ef...ad4dddd85e9f From lxc-bot at linuxcontainers.org Sun May 12 00:04:04 2019 From: lxc-bot at linuxcontainers.org (rikardfalkeborn on Github) Date: Sat, 11 May 2019 17:04:04 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Fix returning -1 in functions with return type bool Message-ID: <5cd762f4.1c69fb81.c0d07.6214SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 422 bytes Desc: not available URL: -------------- next part -------------- From 17e68c49cf920fba52e937dcf6e0071035ee7927 Mon Sep 17 00:00:00 2001 From: Rikard Falkeborn Date: Sun, 12 May 2019 01:39:51 +0200 Subject: [PATCH 1/3] criu: Remove unnecessary return after _exit() Since _exit() will terminate, the return statement is dead code. Also, returning -1 from a function with bool as return type is confusing. Detected with cppcheck. Signed-off-by: Rikard Falkeborn --- src/lxc/criu.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/lxc/criu.c b/src/lxc/criu.c index 1b74cf8137..86f6f18367 100644 --- a/src/lxc/criu.c +++ b/src/lxc/criu.c @@ -1273,7 +1273,6 @@ static bool do_dump(struct lxc_container *c, char *mode, struct migrate_opts *op if (!cgroup_ops) { ERROR("failed to cgroup_init()"); _exit(EXIT_FAILURE); - return -1; } os.pipefd = criuout[1]; From 4d927e7f424acc3002531b10af190a947f123ca0 Mon Sep 17 00:00:00 2001 From: Rikard Falkeborn Date: Sun, 12 May 2019 01:46:27 +0200 Subject: [PATCH 2/3] lvm: Fix return value if lvm_create_clone fails Returning -1 in a function with return type bool is the same as returning true. Change to return false to indicate error properly. Detected with cppcheck. Signed-off-by: Rikard Falkeborn --- src/lxc/storage/lvm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/lxc/storage/lvm.c b/src/lxc/storage/lvm.c index e30f821609..cc267b9528 100644 --- a/src/lxc/storage/lvm.c +++ b/src/lxc/storage/lvm.c @@ -535,13 +535,13 @@ bool lvm_create_clone(struct lxc_conf *conf, struct lxc_storage *orig, if (!newsize && blk_getsize(orig, &size) < 0) { ERROR("Failed to detect size of logical volume \"%s\"", orig->src); - return -1; + return false; } /* detect filesystem */ if (detect_fs(orig, fstype, 100) < 0) { INFO("Failed to detect filesystem type for \"%s\"", orig->src); - return -1; + return false; } } else if (!newsize) { size = DEFAULT_FS_SIZE; @@ -553,7 +553,7 @@ bool lvm_create_clone(struct lxc_conf *conf, struct lxc_storage *orig, ret = do_lvm_create(src, size, thinpool); if (ret < 0) { ERROR("Failed to create lvm storage volume \"%s\"", src); - return -1; + return false; } cmd_args[0] = fstype; @@ -563,7 +563,7 @@ bool lvm_create_clone(struct lxc_conf *conf, struct lxc_storage *orig, if (ret < 0) { ERROR("Failed to create new filesystem \"%s\" for lvm storage " "volume \"%s\": %s", fstype, src, cmd_output); - return -1; + return false; } data.orig = orig; From cdcaad486806b9c892fe3c050444e65c593c4c06 Mon Sep 17 00:00:00 2001 From: Rikard Falkeborn Date: Sun, 12 May 2019 01:47:56 +0200 Subject: [PATCH 3/3] zfs: Fix return value on zfs_snapshot error Returning -1 in a function with return type bool is the same as returning true. Change to return false to indicate error properly. Detected with cppcheck. Signed-off-by: Rikard Falkeborn --- src/lxc/storage/zfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/lxc/storage/zfs.c b/src/lxc/storage/zfs.c index b75708f1c1..fc3b32247e 100644 --- a/src/lxc/storage/zfs.c +++ b/src/lxc/storage/zfs.c @@ -427,7 +427,7 @@ bool zfs_snapshot(struct lxc_conf *conf, struct lxc_storage *orig, if (ret < 0 || ret >= PATH_MAX) { ERROR("Failed to create string"); free(snapshot); - return -1; + return false; } cmd_args.dataset = lxc_storage_get_path(new->src, new->type); From lxc-bot at linuxcontainers.org Sun May 12 00:32:46 2019 From: lxc-bot at linuxcontainers.org (rikardfalkeborn on Github) Date: Sat, 11 May 2019 17:32:46 -0700 (PDT) Subject: [lxc-devel] [lxc/master] initutils: Fix memleak on realloc failure Message-ID: <5cd769ae.1c69fb81.f05a1.ee8dSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 301 bytes Desc: not available URL: -------------- next part -------------- From 7d07da0e998358620f7ca9600505785dd74f6536 Mon Sep 17 00:00:00 2001 From: Rikard Falkeborn Date: Sun, 12 May 2019 02:22:15 +0200 Subject: [PATCH] initutils: Fix memleak on realloc failure Signed-off-by: Rikard Falkeborn --- src/lxc/initutils.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/lxc/initutils.c b/src/lxc/initutils.c index a55d8b2860..8b3c2d8307 100644 --- a/src/lxc/initutils.c +++ b/src/lxc/initutils.c @@ -242,7 +242,7 @@ int setproctitle(char *title) { __do_fclose FILE *f = NULL; int i, fd, len; - char *buf_ptr; + char *buf_ptr, *tmp_proctitle; char buf[LXC_LINELEN]; int ret = 0; ssize_t bytes_read = 0; @@ -305,10 +305,12 @@ int setproctitle(char *title) * want to have room for it. */ len = strlen(title) + 1; - proctitle = realloc(proctitle, len); + tmp_proctitle = realloc(proctitle, len); if (!proctitle) return -1; + proctitle = tmp_proctitle; + arg_start = (unsigned long)proctitle; arg_end = arg_start + len; From sreeginsree5298 at gmail.com Mon May 13 09:13:01 2019 From: sreeginsree5298 at gmail.com (Sreejin K) Date: Mon, 13 May 2019 14:43:01 +0530 Subject: [lxc-devel] LXC-[hardware access] Message-ID: Hi, I am doing some research on LXC containers, so far I have tried running Ubuntu container and a c application which is written inside the container,and it was successful. Next I need to run a frame buffer sample code inside a Ubuntu container.when I tried to do it shows a error message: ERROR:cannot open framebuffer device:No such file or directory. Does the error is because container cannot access host's hardware?if there is any method to access host hardware. Please do support.. Thanks & Regards Sreegin K -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxc-bot at linuxcontainers.org Mon May 13 09:47:36 2019 From: lxc-bot at linuxcontainers.org (CajuM on Github) Date: Mon, 13 May 2019 02:47:36 -0700 (PDT) Subject: [lxc-devel] [crio-lxc/master] Add static build support Message-ID: <5cd93d38.1c69fb81.a0c84.9947SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 425 bytes Desc: not available URL: -------------- next part -------------- From f57179a22407eff45829e4a2d3901655667bd740 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mihai-Drosi=20C=C3=A2ju?= Date: Mon, 13 May 2019 12:08:25 +0300 Subject: [PATCH 1/2] use rm -f in clean target --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 92d7d0d..86a92d7 100644 --- a/Makefile +++ b/Makefile @@ -20,4 +20,4 @@ vendorup: .PHONY: clean clean: - -rm -r crio-lxc + -rm -f crio-lxc From 22253c752304912783cecf0f5c96dbe68a487e6a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mihai-Drosi=20C=C3=A2ju?= Date: Mon, 13 May 2019 12:40:34 +0300 Subject: [PATCH 2/2] Add dockerfile and static build make target --- Makefile | 10 ++++++++++ docker/Dockerfile | 11 +++++++++++ 2 files changed, 21 insertions(+) create mode 100644 docker/Dockerfile diff --git a/Makefile b/Makefile index 86a92d7..b3163d9 100644 --- a/Makefile +++ b/Makefile @@ -7,6 +7,16 @@ CRIO_REPO?=~/packages/cri-o crio-lxc: $(GO_SRC) go build -ldflags "-X main.version=$(COMMIT)" -o crio-lxc ./cmd +.PHONY: dokcer-static +crio-lxc-static: $(GO_SRC) + docker build -t build:crio-lxc docker + docker run --rm -i -t \ + -v "$$PWD:/go/src/crio-lxc" \ + -u $$UID:$$GID \ + -e HOME=/var/tmp \ + -w /go/src/crio-lxc \ + build:crio-lxc make + # make test TEST=basic will run only the basic test. .PHONY: check check: crio-lxc diff --git a/docker/Dockerfile b/docker/Dockerfile new file mode 100644 index 0000000..ff88a4a --- /dev/null +++ b/docker/Dockerfile @@ -0,0 +1,11 @@ +FROM golang:alpine + +RUN \ + apk update && \ + apk upgrade && \ + apk add libcap-dev lxc-dev libseccomp-dev \ + make git gcc musl-dev && \ + sed -i 's/^Libs: .*$/Libs: -L${libdir} -static -llxc -lseccomp -lcap -lutil/g' \ + /usr/lib/pkgconfig/lxc.pc + +ENV GO111MODULE on From lxc-bot at linuxcontainers.org Mon May 13 11:13:39 2019 From: lxc-bot at linuxcontainers.org (Rachid-Koucha on Github) Date: Mon, 13 May 2019 04:13:39 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Config: check for %m availability Message-ID: <5cd95163.1c69fb81.d954c.f853SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 520 bytes Desc: not available URL: -------------- next part -------------- From 720bbb3118e8eb094be2d17aca841046f581bc7e Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Mon, 13 May 2019 13:13:18 +0200 Subject: [PATCH] Config: check for %m availability GLIBC supports %m to avoid calling strerror(). Using it saves some code space. ==> This check will define HAVE_M_FORMAT to be use wherever possible (e.g. log.h) Signed-off-by: Rachid Koucha --- configure.ac | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/configure.ac b/configure.ac index 8d6774f236..029953e818 100644 --- a/configure.ac +++ b/configure.ac @@ -637,6 +637,33 @@ AC_CHECK_FUNCS([setns pivot_root sethostname unshare rand_r confstr faccessat ge # - STRERROR_R_CHAR_P if it returns char * AC_FUNC_STRERROR_R +# Check if "%m" is supported by printf and Co +AC_MSG_CHECKING([%m format]) +AC_TRY_RUN([ +#include +int main(void) +{ + char msg[256]; + int rc; + + rc = snprintf(msg, sizeof(msg), "%m\n"); + if ((rc > 1) && (msg[0] != '%')) + { + return 0; + } + else + { + return 1; + } +}], +[fmt_m=yes], [fmt_m=no]) +if test "x$fmt_m" = "xyes"; then + AC_DEFINE([HAVE_M_FORMAT], 1, [Have %m format]) + AC_MSG_RESULT([yes]) +else + AC_MSG_RESULT([no]) +fi + # Check for some functions AC_CHECK_LIB(pthread, main) AC_CHECK_FUNCS(statvfs) From noreply at github.com Mon May 13 11:18:56 2019 From: noreply at github.com (Christian Brauner) Date: Mon, 13 May 2019 04:18:56 -0700 Subject: [lxc-devel] [lxc/lxc] 720bbb: Config: check for %m availability Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 720bbb3118e8eb094be2d17aca841046f581bc7e https://github.com/lxc/lxc/commit/720bbb3118e8eb094be2d17aca841046f581bc7e Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-13 (Mon, 13 May 2019) Changed paths: M configure.ac Log Message: ----------- Config: check for %m availability GLIBC supports %m to avoid calling strerror(). Using it saves some code space. ==> This check will define HAVE_M_FORMAT to be use wherever possible (e.g. log.h) Signed-off-by: Rachid Koucha Commit: fa9aa1fabb77e9ee7759ea2daa2aa65f3525248c https://github.com/lxc/lxc/commit/fa9aa1fabb77e9ee7759ea2daa2aa65f3525248c Author: Christian Brauner Date: 2019-05-13 (Mon, 13 May 2019) Changed paths: M configure.ac Log Message: ----------- Merge pull request #3000 from Rachid-Koucha/patch-11 Config: check for %m availability Compare: https://github.com/lxc/lxc/compare/ad4dddd85e9f...fa9aa1fabb77 From noreply at github.com Mon May 13 11:19:24 2019 From: noreply at github.com (Christian Brauner) Date: Mon, 13 May 2019 04:19:24 -0700 Subject: [lxc-devel] [lxc/lxc] 17e68c: criu: Remove unnecessary return after _exit() Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 17e68c49cf920fba52e937dcf6e0071035ee7927 https://github.com/lxc/lxc/commit/17e68c49cf920fba52e937dcf6e0071035ee7927 Author: Rikard Falkeborn Date: 2019-05-12 (Sun, 12 May 2019) Changed paths: M src/lxc/criu.c Log Message: ----------- criu: Remove unnecessary return after _exit() Since _exit() will terminate, the return statement is dead code. Also, returning -1 from a function with bool as return type is confusing. Detected with cppcheck. Signed-off-by: Rikard Falkeborn Commit: 4d927e7f424acc3002531b10af190a947f123ca0 https://github.com/lxc/lxc/commit/4d927e7f424acc3002531b10af190a947f123ca0 Author: Rikard Falkeborn Date: 2019-05-12 (Sun, 12 May 2019) Changed paths: M src/lxc/storage/lvm.c Log Message: ----------- lvm: Fix return value if lvm_create_clone fails Returning -1 in a function with return type bool is the same as returning true. Change to return false to indicate error properly. Detected with cppcheck. Signed-off-by: Rikard Falkeborn Commit: cdcaad486806b9c892fe3c050444e65c593c4c06 https://github.com/lxc/lxc/commit/cdcaad486806b9c892fe3c050444e65c593c4c06 Author: Rikard Falkeborn Date: 2019-05-12 (Sun, 12 May 2019) Changed paths: M src/lxc/storage/zfs.c Log Message: ----------- zfs: Fix return value on zfs_snapshot error Returning -1 in a function with return type bool is the same as returning true. Change to return false to indicate error properly. Detected with cppcheck. Signed-off-by: Rikard Falkeborn Commit: 7d4188ce7168d0a7f595590c992961b5dfdb6e39 https://github.com/lxc/lxc/commit/7d4188ce7168d0a7f595590c992961b5dfdb6e39 Author: Christian Brauner Date: 2019-05-13 (Mon, 13 May 2019) Changed paths: M src/lxc/criu.c M src/lxc/storage/lvm.c M src/lxc/storage/zfs.c Log Message: ----------- Merge pull request #2998 from rikardfalkeborn/fix-returning-non-bool Fix returning -1 in functions with return type bool Compare: https://github.com/lxc/lxc/compare/fa9aa1fabb77...7d4188ce7168 From noreply at github.com Mon May 13 11:19:57 2019 From: noreply at github.com (Christian Brauner) Date: Mon, 13 May 2019 04:19:57 -0700 Subject: [lxc-devel] [lxc/lxc] e1d430: initutils: Fix memleak on realloc failure Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: e1d43053849d62f15925b343c67cbac2ed7218ba https://github.com/lxc/lxc/commit/e1d43053849d62f15925b343c67cbac2ed7218ba Author: Rikard Falkeborn Date: 2019-05-12 (Sun, 12 May 2019) Changed paths: M src/lxc/initutils.c Log Message: ----------- initutils: Fix memleak on realloc failure Signed-off-by: Rikard Falkeborn Commit: 612e48a364e653418855295232fd7f73cc9b144d https://github.com/lxc/lxc/commit/612e48a364e653418855295232fd7f73cc9b144d Author: Christian Brauner Date: 2019-05-13 (Mon, 13 May 2019) Changed paths: M src/lxc/initutils.c Log Message: ----------- Merge pull request #2999 from rikardfalkeborn/fix-realloc-memleak-proctitle initutils: Fix memleak on realloc failure Compare: https://github.com/lxc/lxc/compare/7d4188ce7168...612e48a364e6 From builds at travis-ci.org Mon May 13 11:22:22 2019 From: builds at travis-ci.org (Travis CI) Date: Mon, 13 May 2019 11:22:22 +0000 Subject: [lxc-devel] Passed: lxc/lxc#6823 (master - 612e48a) In-Reply-To: Message-ID: <5cd9536e7b2d4_43f851ef1035051093b@ff7853b1-c21e-4f6c-b817-8408af1cbe8e.mail> Build Update for lxc/lxc ------------------------------------- Build: #6823 Status: Passed Duration: 1 min and 53 secs Commit: 612e48a (master) Author: Christian Brauner Message: Merge pull request #2999 from rikardfalkeborn/fix-realloc-memleak-proctitle initutils: Fix memleak on realloc failure View the changeset: https://github.com/lxc/lxc/compare/7d4188ce7168...612e48a364e6 View the full build log and details: https://travis-ci.org/lxc/lxc/builds/531730809?utm_medium=notification&utm_source=email -- You can unsubscribe from build emails from the lxc/lxc repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=1693277&utm_medium=notification&utm_source=email. Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email. Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxc-bot at linuxcontainers.org Mon May 13 12:47:38 2019 From: lxc-bot at linuxcontainers.org (Rachid-Koucha on Github) Date: Mon, 13 May 2019 05:47:38 -0700 (PDT) Subject: [lxc-devel] [lxc/master] Use %m instead of strerror() when available Message-ID: <5cd9676a.1c69fb81.b939.e710SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 407 bytes Desc: not available URL: -------------- next part -------------- From a1d652c25b11e3eb51001bb9d9605d75537d9f40 Mon Sep 17 00:00:00 2001 From: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: Mon, 13 May 2019 13:21:14 +0200 Subject: [PATCH] Use %m instead of strerror() when available Use %m under HAVE_M_FORMAT instead of strerror() Signed-off-by: Rachid Koucha --- src/lxc/log.h | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/src/lxc/log.h b/src/lxc/log.h index 3b7557edbd..4ced2d7506 100644 --- a/src/lxc/log.h +++ b/src/lxc/log.h @@ -416,53 +416,94 @@ ATTR_UNUSED static inline void LXC_##LEVEL(struct lxc_log_locinfo* locinfo, \ LXC_FATAL(&locinfo, format, ##__VA_ARGS__); \ } while (0) +#if HAVE_M_FORMAT +#define SYSTRACE(format, ...) \ + TRACE("%m - " format, ##__VA_ARGS__); +#else #define SYSTRACE(format, ...) \ do { \ lxc_log_strerror_r; \ TRACE("%s - " format, ptr, ##__VA_ARGS__); \ } while (0) +#endif +#if HAVE_M_FORMAT +#define SYSDEBUG(format, ...) \ + DEBUG("%m - " format, ##__VA_ARGS__) +#else #define SYSDEBUG(format, ...) \ do { \ lxc_log_strerror_r; \ DEBUG("%s - " format, ptr, ##__VA_ARGS__); \ } while (0) +#endif + +#if HAVE_M_FORMAT +#define SYSINFO(format, ...) \ + INFO("%m - " format, ##__VA_ARGS__) +#else #define SYSINFO(format, ...) \ do { \ lxc_log_strerror_r; \ INFO("%s - " format, ptr, ##__VA_ARGS__); \ } while (0) +#endif +#if HAVE_M_FORMAT +#define SYSNOTICE(format, ...) \ + NOTICE("%m - " format, ##__VA_ARGS__) +#else #define SYSNOTICE(format, ...) \ do { \ lxc_log_strerror_r; \ NOTICE("%s - " format, ptr, ##__VA_ARGS__); \ } while (0) +#endif +#if HAVE_M_FORMAT +#define SYSWARN(format, ...) \ + WARN("%m - " format, ##__VA_ARGS__) +#else #define SYSWARN(format, ...) \ do { \ lxc_log_strerror_r; \ WARN("%s - " format, ptr, ##__VA_ARGS__); \ } while (0) +#endif +#if HAVE_M_FORMAT +#define SYSERROR(format, ...) \ + ERROR("%m - " format, ##__VA_ARGS__) +#else #define SYSERROR(format, ...) \ do { \ lxc_log_strerror_r; \ ERROR("%s - " format, ptr, ##__VA_ARGS__); \ } while (0) +#endif +#if HAVE_M_FORMAT +#define CMD_SYSERROR(format, ...) \ + fprintf(stderr, "%m - " format, ##__VA_ARGS__) +#else #define CMD_SYSERROR(format, ...) \ do { \ lxc_log_strerror_r; \ fprintf(stderr, "%s - " format, ptr, ##__VA_ARGS__); \ } while (0) +#endif +#if HAVE_M_FORMAT +#define CMD_SYSINFO(format, ...) \ + printf("%m - " format, ##__VA_ARGS__) +#else #define CMD_SYSINFO(format, ...) \ do { \ lxc_log_strerror_r; \ printf("%s - " format, ptr, ##__VA_ARGS__); \ } while (0) +#endif extern int lxc_log_fd; From noreply at github.com Mon May 13 13:57:32 2019 From: noreply at github.com (Christian Brauner) Date: Mon, 13 May 2019 06:57:32 -0700 Subject: [lxc-devel] [lxc/lxc] a1d652: Use %m instead of strerror() when available Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: a1d652c25b11e3eb51001bb9d9605d75537d9f40 https://github.com/lxc/lxc/commit/a1d652c25b11e3eb51001bb9d9605d75537d9f40 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-13 (Mon, 13 May 2019) Changed paths: M src/lxc/log.h Log Message: ----------- Use %m instead of strerror() when available Use %m under HAVE_M_FORMAT instead of strerror() Signed-off-by: Rachid Koucha Commit: 9a719a64e59948f72104f14eab8eeb121bd8bb83 https://github.com/lxc/lxc/commit/9a719a64e59948f72104f14eab8eeb121bd8bb83 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-13 (Mon, 13 May 2019) Changed paths: M src/lxc/log.h Log Message: ----------- Error prone semicolon Suppressed error prone semicolon in SYSTRACE() macro. Signed-off-by: Rachid Koucha Commit: 7aea50feb954bdd2097d7bf8a252fb13878fe0b4 https://github.com/lxc/lxc/commit/7aea50feb954bdd2097d7bf8a252fb13878fe0b4 Author: Christian Brauner Date: 2019-05-13 (Mon, 13 May 2019) Changed paths: M src/lxc/log.h Log Message: ----------- Merge pull request #3001 from Rachid-Koucha/patch-11 Use %m instead of strerror() when available Compare: https://github.com/lxc/lxc/compare/612e48a364e6...7aea50feb954 From lxc-bot at linuxcontainers.org Mon May 13 14:05:42 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Mon, 13 May 2019 07:05:42 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/storage/cephfs: Initial support Message-ID: <5cd979b6.1c69fb81.d0c79.c446SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 354 bytes Desc: not available URL: -------------- next part -------------- From a4c7b64add8084f13cb55fe494887a0743118448 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Sun, 12 May 2019 22:44:45 +0200 Subject: [PATCH] lxd/storage/cephfs: Initial support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- doc/storage.md | 8 + lxd/storage.go | 25 +- lxd/storage_cephfs.go | 956 ++++++++++++++++++++++++++++++++++ lxd/storage_pools_config.go | 10 +- lxd/storage_pools_utils.go | 2 +- lxd/storage_volumes_config.go | 11 +- 6 files changed, 1006 insertions(+), 6 deletions(-) create mode 100644 lxd/storage_cephfs.go diff --git a/doc/storage.md b/doc/storage.md index 2b4f287133..e00c3289f5 100644 --- a/doc/storage.md +++ b/doc/storage.md @@ -16,6 +16,9 @@ ceph.osd.pg\_num | string | ceph driver ceph.osd.pool\_name | string | ceph driver | name of the pool | storage\_driver\_ceph | Name of the osd storage pool. ceph.rbd.clone\_copy | string | ceph driver | true | storage\_driver\_ceph | Whether to use RBD lightweight clones rather than full dataset copies. ceph.user.name | string | ceph driver | admin | storage\_ceph\_user\_name | The ceph user to use when creating storage pools and volumes. +cephfs.cluster\_name | string | cephfs driver | ceph | storage\_driver\_cephfs | Name of the ceph cluster in which to create new storage pools. +cephfs.path | string | cephfs driver | / | storage\_driver\_cephfs | The base path for the CEPH fs mount +cephfs.user.name | string | cephfs driver | admin | storage\_driver\_cephfs | The ceph user to use when creating storage pools and volumes. lvm.thinpool\_name | string | lvm driver | LXDThinPool | storage | Thin pool where images and containers are created. lvm.use\_thinpool | bool | lvm driver | true | storage\_lvm\_use\_thinpool | Whether the storage pool uses a thinpool for logical volumes. lvm.vg\_name | string | lvm driver | name of the pool | storage | Name of the volume group to create. @@ -228,6 +231,11 @@ lxc storage create pool1 ceph ceph.osd.pool\_name=my-osd lxc storage create pool1 ceph source=my-already-existing-osd ``` +### CEPHFS + + - Can only be used for custom storage volumes + - Supports snapshots if enabled on the server side + ### Btrfs - Uses a subvolume per container, image and snapshot, creating btrfs snapshots when creating a new object. diff --git a/lxd/storage.go b/lxd/storage.go index f8c7e70c45..e67d3ee9db 100644 --- a/lxd/storage.go +++ b/lxd/storage.go @@ -83,13 +83,14 @@ type storageType int const ( storageTypeBtrfs storageType = iota storageTypeCeph + storageTypeCephFs storageTypeDir storageTypeLvm storageTypeMock storageTypeZfs ) -var supportedStoragePoolDrivers = []string{"btrfs", "ceph", "dir", "lvm", "zfs"} +var supportedStoragePoolDrivers = []string{"btrfs", "ceph", "cephfs", "dir", "lvm", "zfs"} func storageTypeToString(sType storageType) (string, error) { switch sType { @@ -97,6 +98,8 @@ func storageTypeToString(sType storageType) (string, error) { return "btrfs", nil case storageTypeCeph: return "ceph", nil + case storageTypeCephFs: + return "cephfs", nil case storageTypeDir: return "dir", nil case storageTypeLvm: @@ -116,6 +119,8 @@ func storageStringToType(sName string) (storageType, error) { return storageTypeBtrfs, nil case "ceph": return storageTypeCeph, nil + case "cephfs": + return storageTypeCephFs, nil case "dir": return storageTypeDir, nil case "lvm": @@ -266,6 +271,13 @@ func storageCoreInit(driver string) (storage, error) { return nil, err } return &ceph, nil + case storageTypeCephFs: + cephfs := storageCephFs{} + err = cephfs.StorageCoreInit() + if err != nil { + return nil, err + } + return &cephfs, nil case storageTypeLvm: lvm := storageLvm{} err = lvm.StorageCoreInit() @@ -356,6 +368,17 @@ func storageInit(s *state.State, project, poolName, volumeName string, volumeTyp return nil, err } return &ceph, nil + case storageTypeCephFs: + cephfs := storageCephFs{} + cephfs.poolID = poolID + cephfs.pool = pool + cephfs.volume = volume + cephfs.s = s + err = cephfs.StoragePoolInit() + if err != nil { + return nil, err + } + return &cephfs, nil case storageTypeLvm: lvm := storageLvm{} lvm.poolID = poolID diff --git a/lxd/storage_cephfs.go b/lxd/storage_cephfs.go new file mode 100644 index 0000000000..66cc52dc48 --- /dev/null +++ b/lxd/storage_cephfs.go @@ -0,0 +1,956 @@ +package main + +import ( + "bufio" + "fmt" + "io" + "io/ioutil" + "os" + "path/filepath" + "strings" + "syscall" + + "github.com/gorilla/websocket" + "github.com/pkg/errors" + + "github.com/lxc/lxd/lxd/migration" + "github.com/lxc/lxd/lxd/state" + "github.com/lxc/lxd/shared" + "github.com/lxc/lxd/shared/api" + "github.com/lxc/lxd/shared/ioprogress" + "github.com/lxc/lxd/shared/logger" +) + +type storageCephFs struct { + ClusterName string + FsName string + UserName string + storageShared +} + +func (s *storageCephFs) StorageCoreInit() error { + s.sType = storageTypeCeph + typeName, err := storageTypeToString(s.sType) + if err != nil { + return err + } + s.sTypeName = typeName + + if cephVersion != "" { + s.sTypeVersion = cephVersion + return nil + } + + msg, err := shared.RunCommand("rbd", "--version") + if err != nil { + return fmt.Errorf("Error getting CEPH version: %s", err) + } + s.sTypeVersion = strings.TrimSpace(msg) + cephVersion = s.sTypeVersion + + return nil +} + +func (s *storageCephFs) StoragePoolInit() error { + var err error + + err = s.StorageCoreInit() + if err != nil { + return errors.Wrap(err, "Storage pool init") + } + + // set cluster name + if s.pool.Config["cephfs.cluster_name"] != "" { + s.ClusterName = s.pool.Config["cephfs.cluster_name"] + } else { + s.ClusterName = "ceph" + } + + // set ceph user name + if s.pool.Config["cephfs.user.name"] != "" { + s.UserName = s.pool.Config["cephfs.user.name"] + } else { + s.UserName = "admin" + } + + // set osd pool name + if s.pool.Config["ceph.fs.name"] != "" { + s.FsName = s.pool.Config["ceph.fs.name"] + } + + return nil +} + +// Initialize a full storage interface. +func (s *storageCephFs) StoragePoolCheck() error { + return nil +} + +func (s *storageCephFs) StoragePoolCreate() error { + logger.Infof(`Creating CEPHFS storage pool "%s" in cluster "%s"`, s.pool.Name, s.ClusterName) + + // Setup config + s.pool.Config["volatile.initial_source"] = s.pool.Config["source"] + + if s.pool.Config["source"] == "" { + return fmt.Errorf("A ceph fs name OR name/path source is required") + } + + if s.pool.Config["ceph.fs.name"] != "" && s.pool.Config["ceph.fs.name"] != s.pool.Config["source"] { + return fmt.Errorf("ceph.fs.name must match the source") + } + + if s.pool.Config["ceph.cluster_name"] == "" { + s.pool.Config["ceph.cluster_name"] = "ceph" + } + + if s.pool.Config["cephfs.user.name"] != "" { + s.pool.Config["cephfs.user.name"] = "admin" + } + + s.pool.Config["ceph.fs.name"] = s.pool.Config["source"] + s.FsName = s.pool.Config["source"] + + // Parse the namespace / path + fields := strings.SplitN(s.FsName, "/", 2) + fsName := fields[0] + fsPath := "/" + if len(fields) > 1 { + fsPath = fields[1] + } + + // Check that the filesystem exists + if !cephFsExists(s.ClusterName, s.UserName, fsName) { + return fmt.Errorf("The requested '%v' CEPH fs doesn't exist", fsName) + } + + // Create the path if needed + mountPath, err := ioutil.TempDir("", "lxd_cephfs_") + if err != nil { + return err + } + defer os.RemoveAll(mountPath) + + err = os.Chmod(mountPath, 0700) + if err != nil { + return err + } + + mountPoint := filepath.Join(mountPath, "mount") + err = os.Mkdir(mountPoint, 0700) + if err != nil { + return err + } + + // Get the credentials and host + monAddress, userSecret, err := cephFsConfig(s.ClusterName, s.UserName) + if err != nil { + return err + } + + uri := fmt.Sprintf("%s:/", monAddress) + err = tryMount(uri, mountPoint, "ceph", 0, fmt.Sprintf("name=%v,secret=%v,mds_namespace=%v", s.UserName, userSecret, fsName)) + if err != nil { + return err + } + defer tryUnmount(mountPoint, syscall.MNT_DETACH) + + // Check that the existing path is empty + err = os.MkdirAll(fmt.Sprintf("%s%s", mountPoint, fsPath), 0755) + if err != nil { + return err + } + + ok, _ := shared.PathIsEmpty(fmt.Sprintf("%s%s", mountPoint, fsPath)) + if !ok { + return fmt.Errorf("Only empty CEPH fs paths can be used as a LXD storage pool") + } + + // Create the mountpoint for the storage pool. + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + err = os.MkdirAll(poolMntPoint, 0711) + if err != nil { + logger.Errorf(`Failed to create mountpoint "%s" for CEPH fs storage pool "%s" in cluster "%s": %s`, poolMntPoint, s.FsName, s.ClusterName, err) + return err + } + logger.Debugf(`Created mountpoint "%s" for CEPH fs storage pool "%s" in cluster "%s"`, poolMntPoint, s.FsName, s.ClusterName) + logger.Infof(`Created CEPH fs storage pool "%s" in cluster "%s"`, s.pool.Name, s.ClusterName) + + return nil +} + +func (s *storageCephFs) StoragePoolDelete() error { + logger.Infof(`Deleting CEPH fs storage pool "%s" in cluster "%s"`, s.pool.Name, s.ClusterName) + + // Mount the storage pool + // Delete the content + // Umount the storage pool + + // Delete the mountpoint for the storage pool + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + if shared.PathExists(poolMntPoint) { + err := os.RemoveAll(poolMntPoint) + if err != nil { + logger.Errorf(`Failed to delete mountpoint "%s" for CEPH fs storage pool "%s" in cluster "%s": %s`, poolMntPoint, s.FsName, s.ClusterName, err) + return err + } + logger.Debugf(`Deleted mountpoint "%s" for CEPH fs storage pool "%s" in cluster "%s"`, poolMntPoint, s.FsName, s.ClusterName) + } + + logger.Infof(`Deleted CEPH fs storage pool "%s" in cluster "%s"`, s.pool.Name, s.ClusterName) + return nil +} + +func (s *storageCephFs) StoragePoolMount() (bool, error) { + logger.Debugf("Mounting CEPHFS storage pool \"%s\"", s.pool.Name) + + poolMountLockID := getPoolMountLockID(s.pool.Name) + lxdStorageMapLock.Lock() + if waitChannel, ok := lxdStorageOngoingOperationMap[poolMountLockID]; ok { + lxdStorageMapLock.Unlock() + if _, ok := <-waitChannel; ok { + logger.Warnf("Received value over semaphore, this should not have happened") + } + // Give the benefit of the doubt and assume that the other + // thread actually succeeded in mounting the storage pool. + return false, nil + } + + lxdStorageOngoingOperationMap[poolMountLockID] = make(chan bool) + lxdStorageMapLock.Unlock() + + removeLockFromMap := func() { + lxdStorageMapLock.Lock() + if waitChannel, ok := lxdStorageOngoingOperationMap[poolMountLockID]; ok { + close(waitChannel) + delete(lxdStorageOngoingOperationMap, poolMountLockID) + } + lxdStorageMapLock.Unlock() + } + defer removeLockFromMap() + + // Check if already mounted + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + if shared.IsMountPoint(poolMntPoint) { + return false, nil + } + + // Parse the namespace / path + fields := strings.SplitN(s.FsName, "/", 2) + fsName := fields[0] + fsPath := "/" + if len(fields) > 1 { + fsPath = fields[1] + } + logger.Errorf("s.FsName=%v fields=%v", s.FsName, fields) + + // Get the credentials and host + monAddress, secret, err := cephFsConfig(s.ClusterName, s.UserName) + if err != nil { + return false, err + } + + // Do the actual mount + uri := fmt.Sprintf("%s:%s", monAddress, fsPath) + err = tryMount(uri, poolMntPoint, "ceph", 0, fmt.Sprintf("name=%v,secret=%v,mds_namespace=%v", s.UserName, secret, fsName)) + if err != nil { + return false, err + } + + logger.Debugf("Mounted CEPHFS storage pool \"%s\"", s.pool.Name) + + return true, nil +} + +func (s *storageCephFs) StoragePoolUmount() (bool, error) { + source := s.pool.Config["source"] + if source == "" { + return false, fmt.Errorf("no \"source\" property found for the storage pool") + } + cleanSource := filepath.Clean(source) + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + if cleanSource == poolMntPoint { + return true, nil + } + + logger.Debugf("Unmounting CEPHFS storage pool \"%s\"", s.pool.Name) + + poolUmountLockID := getPoolUmountLockID(s.pool.Name) + lxdStorageMapLock.Lock() + if waitChannel, ok := lxdStorageOngoingOperationMap[poolUmountLockID]; ok { + lxdStorageMapLock.Unlock() + if _, ok := <-waitChannel; ok { + logger.Warnf("Received value over semaphore, this should not have happened") + } + // Give the benefit of the doubt and assume that the other + // thread actually succeeded in unmounting the storage pool. + return false, nil + } + + lxdStorageOngoingOperationMap[poolUmountLockID] = make(chan bool) + lxdStorageMapLock.Unlock() + + removeLockFromMap := func() { + lxdStorageMapLock.Lock() + if waitChannel, ok := lxdStorageOngoingOperationMap[poolUmountLockID]; ok { + close(waitChannel) + delete(lxdStorageOngoingOperationMap, poolUmountLockID) + } + lxdStorageMapLock.Unlock() + } + + defer removeLockFromMap() + + if !shared.IsMountPoint(poolMntPoint) { + return false, nil + } + + err := syscall.Unmount(poolMntPoint, 0) + if err != nil { + return false, err + } + + logger.Debugf("Unmounted CEPHFS pool \"%s\"", s.pool.Name) + return true, nil +} + +func (s *storageCephFs) GetStoragePoolWritable() api.StoragePoolPut { + return s.pool.Writable() +} + +func (s *storageCephFs) GetStoragePoolVolumeWritable() api.StorageVolumePut { + return s.volume.Writable() +} + +func (s *storageCephFs) SetStoragePoolWritable(writable *api.StoragePoolPut) { + s.pool.StoragePoolPut = *writable +} + +func (s *storageCephFs) SetStoragePoolVolumeWritable(writable *api.StorageVolumePut) { + s.volume.StorageVolumePut = *writable +} + +func (s *storageCephFs) GetContainerPoolInfo() (int64, string, string) { + return s.poolID, s.pool.Name, s.pool.Name +} + +func (s *storageCephFs) StoragePoolUpdate(writable *api.StoragePoolPut, changedConfig []string) error { + logger.Infof(`Updating CEPHFS storage pool "%s"`, s.pool.Name) + + _, err := s.StoragePoolMount() + if err != nil { + return err + } + + changeable := changeableStoragePoolProperties["cephfs"] + unchangeable := []string{} + for _, change := range changedConfig { + if !shared.StringInSlice(change, changeable) { + unchangeable = append(unchangeable, change) + } + } + + if len(unchangeable) > 0 { + return updateStoragePoolError(unchangeable, "cephfs") + } + + // "rsync.bwlimit" requires no on-disk modifications. + + logger.Infof(`Updated CEPHFS storage pool "%s"`, s.pool.Name) + return nil +} + +// Functions dealing with storage pools. +func (s *storageCephFs) StoragePoolVolumeCreate() error { + logger.Infof("Creating CEPHFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) + + _, err := s.StoragePoolMount() + if err != nil { + return err + } + + source := s.pool.Config["source"] + if source == "" { + return fmt.Errorf("no \"source\" property found for the storage pool") + } + + isSnapshot := shared.IsSnapshot(s.volume.Name) + + var storageVolumePath string + + if isSnapshot { + storageVolumePath = getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) + } else { + storageVolumePath = getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) + } + + err = os.MkdirAll(storageVolumePath, 0711) + if err != nil { + return err + } + + logger.Infof("Created CEPHFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) + return nil +} + +func (s *storageCephFs) StoragePoolVolumeDelete() error { + logger.Infof("Deleting CEPHFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) + + source := s.pool.Config["source"] + if source == "" { + return fmt.Errorf("no \"source\" property found for the storage pool") + } + + storageVolumePath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) + if !shared.PathExists(storageVolumePath) { + return nil + } + + err := os.RemoveAll(storageVolumePath) + if err != nil { + return err + } + + err = s.s.Cluster.StoragePoolVolumeDelete( + "default", + s.volume.Name, + storagePoolVolumeTypeCustom, + s.poolID) + if err != nil { + logger.Errorf(`Failed to delete database entry for CEPHFS storage volume "%s" on storage pool "%s"`, + s.volume.Name, s.pool.Name) + } + + logger.Infof("Deleted CEPHFS storage volume \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) + return nil +} + +func (s *storageCephFs) StoragePoolVolumeMount() (bool, error) { + return true, nil +} + +func (s *storageCephFs) StoragePoolVolumeUmount() (bool, error) { + return true, nil +} + +func (s *storageCephFs) StoragePoolVolumeUpdate(writable *api.StorageVolumePut, changedConfig []string) error { + if writable.Restore == "" { + logger.Infof(`Updating CEPHFS storage volume "%s"`, s.volume.Name) + } + + _, err := s.StoragePoolMount() + if err != nil { + return err + } + + if writable.Restore != "" { + logger.Infof(`Restoring CEPHFS storage volume "%s" from snapshot "%s"`, + s.volume.Name, writable.Restore) + + sourcePath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, + fmt.Sprintf("%s/%s", s.volume.Name, writable.Restore)) + targetPath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) + + // Restore using rsync + bwlimit := s.pool.Config["rsync.bwlimit"] + output, err := rsyncLocalCopy(sourcePath, targetPath, bwlimit) + if err != nil { + return fmt.Errorf("failed to rsync container: %s: %s", string(output), err) + } + + logger.Infof(`Restored CEPHFS storage volume "%s" from snapshot "%s"`, + s.volume.Name, writable.Restore) + return nil + } + + changeable := changeableStoragePoolVolumeProperties["cephfs"] + unchangeable := []string{} + for _, change := range changedConfig { + if !shared.StringInSlice(change, changeable) { + unchangeable = append(unchangeable, change) + } + } + + if len(unchangeable) > 0 { + return updateStoragePoolVolumeError(unchangeable, "cephfs") + } + + logger.Infof(`Updated CEPHFS storage volume "%s"`, s.volume.Name) + return nil +} + +func (s *storageCephFs) StoragePoolVolumeRename(newName string) error { + logger.Infof(`Renaming CEPHFS storage volume on storage pool "%s" from "%s" to "%s`, + s.pool.Name, s.volume.Name, newName) + + _, err := s.StoragePoolMount() + if err != nil { + return err + } + + usedBy, err := storagePoolVolumeUsedByContainersGet(s.s, "default", s.volume.Name, storagePoolVolumeTypeNameCustom) + if err != nil { + return err + } + if len(usedBy) > 0 { + return fmt.Errorf(`CEPHFS storage volume "%s" on storage pool "%s" is attached to containers`, + s.volume.Name, s.pool.Name) + } + + oldPath := getStoragePoolVolumeMountPoint(s.pool.Name, s.volume.Name) + newPath := getStoragePoolVolumeMountPoint(s.pool.Name, newName) + err = os.Rename(oldPath, newPath) + if err != nil { + return err + } + + logger.Infof(`Renamed CEPHFS storage volume on storage pool "%s" from "%s" to "%s`, + s.pool.Name, s.volume.Name, newName) + + return s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, newName, + storagePoolVolumeTypeCustom, s.poolID) +} + +func (s *storageCephFs) ContainerStorageReady(container container) bool { + containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) + ok, _ := shared.PathIsEmpty(containerMntPoint) + return !ok +} + +func (s *storageCephFs) ContainerCreate(container container) error { + logger.Debugf("Creating empty CEPHFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) + + _, err := s.StoragePoolMount() + if err != nil { + return err + } + + source := s.pool.Config["source"] + if source == "" { + return fmt.Errorf("no \"source\" property found for the storage pool") + } + + containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, container.Name()) + err = createContainerMountpoint(containerMntPoint, container.Path(), container.IsPrivileged()) + if err != nil { + return err + } + revert := true + defer func() { + if !revert { + return + } + deleteContainerMountpoint(containerMntPoint, container.Path(), s.GetStorageTypeName()) + }() + + err = container.TemplateApply("create") + if err != nil { + return errors.Wrap(err, "Apply template") + } + + revert = false + + logger.Debugf("Created empty CEPHFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) + return nil +} + +func (s *storageCephFs) ContainerCreateFromImage(container container, imageFingerprint string, tracker *ioprogress.ProgressTracker) error { + logger.Debugf("Creating CEPHFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) + + _, err := s.StoragePoolMount() + if err != nil { + return err + } + + source := s.pool.Config["source"] + if source == "" { + return fmt.Errorf("no \"source\" property found for the storage pool") + } + + privileged := container.IsPrivileged() + containerName := container.Name() + containerMntPoint := getContainerMountPoint(container.Project(), s.pool.Name, containerName) + err = createContainerMountpoint(containerMntPoint, container.Path(), privileged) + if err != nil { + return errors.Wrap(err, "Create container mount point") + } + revert := true + defer func() { + if !revert { + return + } + s.ContainerDelete(container) + }() + + imagePath := shared.VarPath("images", imageFingerprint) + err = unpackImage(imagePath, containerMntPoint, storageTypeCephFs, s.s.OS.RunningInUserNS, nil) + if err != nil { + return errors.Wrap(err, "Unpack image") + } + + err = container.TemplateApply("create") + if err != nil { + return errors.Wrap(err, "Apply template") + } + + revert = false + + logger.Debugf("Created CEPHFS storage volume for container \"%s\" on storage pool \"%s\"", s.volume.Name, s.pool.Name) + return nil +} + +func (s *storageCephFs) ContainerCanRestore(container container, sourceContainer container) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerDelete(container container) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerCopy(target container, source container, containerOnly bool) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerRefresh(target container, source container, snapshots []container) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerMount(c container) (bool, error) { + return false, fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerUmount(c container, path string) (bool, error) { + return false, fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerRename(container container, newName string) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerRestore(container container, sourceContainer container) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerGetUsage(c container) (int64, error) { + return -1, fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerSnapshotCreate(snapshotContainer container, sourceContainer container) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerSnapshotCreateEmpty(snapshotContainer container) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerSnapshotDelete(snapshotContainer container) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerSnapshotRename(snapshotContainer container, newName string) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerSnapshotStart(container container) (bool, error) { + return false, fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerSnapshotStop(container container) (bool, error) { + return false, fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerBackupCreate(backup backup, source container) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ContainerBackupLoad(info backupInfo, data io.ReadSeeker, tarArgs []string) error { + return fmt.Errorf("CEPHfs cannot be used for containers") +} + +func (s *storageCephFs) ImageCreate(fingerprint string, tracker *ioprogress.ProgressTracker) error { + return fmt.Errorf("CEPHfs cannot be used for images") +} + +func (s *storageCephFs) ImageDelete(fingerprint string) error { + return fmt.Errorf("CEPHfs cannot be used for images") +} + +func (s *storageCephFs) ImageMount(fingerprint string) (bool, error) { + return false, fmt.Errorf("CEPHfs cannot be used for images") +} + +func (s *storageCephFs) ImageUmount(fingerprint string) (bool, error) { + return false, fmt.Errorf("CEPHfs cannot be used for images") +} + +func (s *storageCephFs) MigrationType() migration.MigrationFSType { + return migration.MigrationFSType_RSYNC +} + +func (s *storageCephFs) PreservesInodes() bool { + return false +} + +func (s *storageCephFs) MigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { + return rsyncMigrationSource(args) +} + +func (s *storageCephFs) MigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { + return rsyncMigrationSink(conn, op, args) +} + +func (s *storageCephFs) StorageEntitySetQuota(volumeType int, size int64, data interface{}) error { + // FIXME this may be possible to do with + // setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir # 100 MB + return fmt.Errorf("TODO") +} + +func (s *storageCephFs) StoragePoolResources() (*api.ResourcesStoragePool, error) { + _, err := s.StoragePoolMount() + if err != nil { + return nil, err + } + + poolMntPoint := getStoragePoolMountPoint(s.pool.Name) + + return storageResource(poolMntPoint) +} + +func (s *storageCephFs) StoragePoolVolumeCopy(source *api.StorageVolumeSource) error { + logger.Infof("Copying CEPHFS storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) + successMsg := fmt.Sprintf("Copied CEPHFS storage volume \"%s\" on storage pool \"%s\" as \"%s\" to storage pool \"%s\"", source.Name, source.Pool, s.volume.Name, s.pool.Name) + + if s.pool.Name != source.Pool { + // setup storage for the source volume + srcStorage, err := storagePoolVolumeInit(s.s, "default", source.Pool, source.Name, storagePoolVolumeTypeCustom) + if err != nil { + logger.Errorf("Failed to initialize CEPHFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) + return err + } + + ourMount, err := srcStorage.StoragePoolMount() + if err != nil { + logger.Errorf("Failed to mount CEPHFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) + return err + } + if ourMount { + defer srcStorage.StoragePoolUmount() + } + } + + err := s.copyVolume(source.Pool, source.Name, s.volume.Name) + if err != nil { + return err + } + + if source.VolumeOnly { + logger.Infof(successMsg) + return nil + } + + snapshots, err := storagePoolVolumeSnapshotsGet(s.s, source.Pool, source.Name, storagePoolVolumeTypeCustom) + if err != nil { + return err + } + + for _, snap := range snapshots { + _, snapOnlyName, _ := containerGetParentAndSnapshotName(snap) + err = s.copyVolumeSnapshot(source.Pool, snap, fmt.Sprintf("%s/%s", s.volume.Name, snapOnlyName)) + if err != nil { + return err + } + } + + logger.Infof(successMsg) + return nil +} + +func (s *storageCephFs) StorageMigrationSource(args MigrationSourceArgs) (MigrationStorageSourceDriver, error) { + return rsyncStorageMigrationSource(args) +} + +func (s *storageCephFs) StorageMigrationSink(conn *websocket.Conn, op *operation, args MigrationSinkArgs) error { + return rsyncStorageMigrationSink(conn, op, args) +} + +func (s *storageCephFs) GetStoragePool() *api.StoragePool { + return s.pool +} + +func (s *storageCephFs) GetStoragePoolVolume() *api.StorageVolume { + return s.volume +} + +func (s *storageCephFs) GetState() *state.State { + return s.s +} + +func (s *storageCephFs) StoragePoolVolumeSnapshotCreate(target *api.StorageVolumeSnapshotsPost) error { + return fmt.Errorf("TODO") +} + +func (s *storageCephFs) StoragePoolVolumeSnapshotDelete() error { + return fmt.Errorf("TODO") +} + +func (s *storageCephFs) StoragePoolVolumeSnapshotRename(newName string) error { + logger.Infof("Renaming CEPHFS storage volume on storage pool \"%s\" from \"%s\" to \"%s\"", s.pool.Name, s.volume.Name, newName) + var fullSnapshotName string + + if shared.IsSnapshot(newName) { + // When renaming volume snapshots, newName will contain the full snapshot name + fullSnapshotName = newName + } else { + sourceName, _, ok := containerGetParentAndSnapshotName(s.volume.Name) + if !ok { + return fmt.Errorf("Not a snapshot name") + } + + fullSnapshotName = fmt.Sprintf("%s%s%s", sourceName, shared.SnapshotDelimiter, newName) + } + + oldPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, s.volume.Name) + newPath := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, fullSnapshotName) + + if !shared.PathExists(newPath) { + err := os.MkdirAll(newPath, customDirMode) + if err != nil { + return err + } + } + + err := os.Rename(oldPath, newPath) + if err != nil { + return err + } + + logger.Infof("Renamed CEPHFS storage volume on storage pool \"%s\" from \"%s\" to \"%s\"", s.pool.Name, s.volume.Name, newName) + return s.s.Cluster.StoragePoolVolumeRename("default", s.volume.Name, fullSnapshotName, storagePoolVolumeTypeCustom, s.poolID) +} + +func (s *storageCephFs) copyVolume(sourcePool string, source string, target string) error { + var srcMountPoint string + + if shared.IsSnapshot(source) { + srcMountPoint = getStoragePoolVolumeSnapshotMountPoint(sourcePool, source) + } else { + srcMountPoint = getStoragePoolVolumeMountPoint(sourcePool, source) + } + + dstMountPoint := getStoragePoolVolumeMountPoint(s.pool.Name, target) + + err := os.MkdirAll(dstMountPoint, 0711) + if err != nil { + return err + } + + bwlimit := s.pool.Config["rsync.bwlimit"] + + _, err = rsyncLocalCopy(srcMountPoint, dstMountPoint, bwlimit) + if err != nil { + os.RemoveAll(dstMountPoint) + logger.Errorf("Failed to rsync into CEPHFS storage volume \"%s\" on storage pool \"%s\": %s", s.volume.Name, s.pool.Name, err) + return err + } + + return nil +} + +func (s *storageCephFs) copyVolumeSnapshot(sourcePool string, source string, target string) error { + srcMountPoint := getStoragePoolVolumeSnapshotMountPoint(sourcePool, source) + dstMountPoint := getStoragePoolVolumeSnapshotMountPoint(s.pool.Name, target) + + err := os.MkdirAll(dstMountPoint, 0711) + if err != nil { + return err + } + + bwlimit := s.pool.Config["rsync.bwlimit"] + + _, err = rsyncLocalCopy(srcMountPoint, dstMountPoint, bwlimit) + if err != nil { + os.RemoveAll(dstMountPoint) + logger.Errorf("Failed to rsync into CEPHFS storage volume \"%s\" on storage pool \"%s\": %s", target, s.pool.Name, err) + return err + } + + return nil +} + +func cephFsExists(clusterName string, userName string, fsName string) bool { + _, err := shared.RunCommand("ceph", "--name", fmt.Sprintf("client.%s", userName), "--cluster", clusterName, "fs", "get", fsName) + if err != nil { + return false + } + + return true +} + +func cephFsConfig(clusterName string, userName string) (string, string, error) { + // Parse the CEPH configuration + cephConf, err := os.Open(fmt.Sprintf("/etc/ceph/%s.conf", clusterName)) + if err != nil { + return "", "", err + } + + var cephMon string + + scan := bufio.NewScanner(cephConf) + for scan.Scan() { + line := scan.Text() + line = strings.TrimSpace(line) + + if line == "" { + continue + } + + if strings.HasPrefix(line, "mon_host") { + fields := strings.SplitN(line, "=", 2) + if len(fields) < 2 { + continue + } + + cephMon = strings.TrimSpace(fields[1]) + break + } + } + + if cephMon == "" { + return "", "", fmt.Errorf("Couldn't find a CPEH mon") + } + + // Parse the CEPH keyring + cephKeyring, err := os.Open(fmt.Sprintf("/etc/ceph/%v.client.%v.keyring", clusterName, userName)) + if err != nil { + return "", "", err + } + + var cephSecret string + + scan = bufio.NewScanner(cephKeyring) + for scan.Scan() { + line := scan.Text() + line = strings.TrimSpace(line) + + if line == "" { + continue + } + + if strings.HasPrefix(line, "key") { + fields := strings.SplitN(line, "=", 2) + if len(fields) < 2 { + continue + } + + cephSecret = strings.TrimSpace(fields[1]) + break + } + } + + if cephSecret == "" { + return "", "", fmt.Errorf("Couldn't find a keyring entry") + } + + + return cephMon, cephSecret, nil +} diff --git a/lxd/storage_pools_config.go b/lxd/storage_pools_config.go index 228b2abb45..4e654ec01c 100644 --- a/lxd/storage_pools_config.go +++ b/lxd/storage_pools_config.go @@ -24,6 +24,9 @@ var changeableStoragePoolProperties = map[string][]string{ "volume.block.mount_options", "volume.size"}, + "cephfs": { + "rsync.bwlimit"}, + "dir": { "rsync.bwlimit"}, @@ -64,6 +67,11 @@ var storagePoolConfigKeys = map[string]func(value string) error{ "ceph.rbd.clone_copy": shared.IsBool, "ceph.user.name": shared.IsAny, + // valid drivers: cephfs + "cephfs.cluster_name": shared.IsAny, + "cephfs.path": shared.IsAny, + "cephfs.user.name": shared.IsAny, + // valid drivers: lvm "lvm.thinpool_name": shared.IsAny, "lvm.use_thinpool": shared.IsBool, @@ -236,7 +244,7 @@ func storagePoolFillDefault(name string, driver string, config map[string]string } } - if driver == "btrfs" || driver == "ceph" || driver == "lvm" || driver == "zfs" { + if driver == "btrfs" || driver == "ceph" || driver == "cephfs" || driver == "lvm" || driver == "zfs" { if config["volume.size"] != "" { _, err := shared.ParseByteSizeString(config["volume.size"]) if err != nil { diff --git a/lxd/storage_pools_utils.go b/lxd/storage_pools_utils.go index e2dd28144b..5b52dc5aa9 100644 --- a/lxd/storage_pools_utils.go +++ b/lxd/storage_pools_utils.go @@ -11,7 +11,7 @@ import ( "github.com/lxc/lxd/shared/version" ) -var supportedPoolTypes = []string{"btrfs", "ceph", "dir", "lvm", "zfs"} +var supportedPoolTypes = []string{"btrfs", "ceph", "cephfs", "dir", "lvm", "zfs"} func storagePoolUpdate(state *state.State, name, newDescription string, newConfig map[string]string, withDB bool) error { s, err := storagePoolInit(state, name) diff --git a/lxd/storage_volumes_config.go b/lxd/storage_volumes_config.go index f45309063c..d834a2efe4 100644 --- a/lxd/storage_volumes_config.go +++ b/lxd/storage_volumes_config.go @@ -59,6 +59,11 @@ var changeableStoragePoolVolumeProperties = map[string][]string{ "security.unmapped", "size"}, + "cephfs": { + "security.unmapped", + "size", + }, + "dir": { "security.unmapped", }, @@ -75,7 +80,7 @@ var changeableStoragePoolVolumeProperties = map[string][]string{ "zfs.use_refquota"}, } -// btrfs, ceph, dir, lvm, zfs +// btrfs, ceph, cephfs, dir, lvm, zfs var storageVolumeConfigKeys = map[string]func(value string) ([]string, error){ "block.filesystem": func(value string) ([]string, error) { err := shared.IsOneOf(value, []string{"btrfs", "ext4", "xfs"}) @@ -93,7 +98,7 @@ var storageVolumeConfigKeys = map[string]func(value string) ([]string, error){ }, "size": func(value string) ([]string, error) { if value == "" { - return []string{"btrfs", "ceph", "lvm", "zfs"}, nil + return []string{"btrfs", "ceph", "cephfs", "lvm", "zfs"}, nil } _, err := shared.ParseByteSizeString(value) @@ -101,7 +106,7 @@ var storageVolumeConfigKeys = map[string]func(value string) ([]string, error){ return nil, err } - return []string{"btrfs", "ceph", "lvm", "zfs"}, nil + return []string{"btrfs", "ceph", "cephfs", "lvm", "zfs"}, nil }, "volatile.idmap.last": func(value string) ([]string, error) { return supportedPoolTypes, shared.IsAny(value) From lxc-bot at linuxcontainers.org Mon May 13 14:29:18 2019 From: lxc-bot at linuxcontainers.org (lxc-jp on Github) Date: Mon, 13 May 2019 07:29:18 -0700 (PDT) Subject: [lxc-devel] [linuxcontainers.org/master] Add Japanese release announcement of LXD 3.13 Message-ID: <5cd97f3e.1c69fb81.ab18.2113SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 316 bytes Desc: not available URL: -------------- next part -------------- From 0b4d8cd6edc3a19e72e7e9c1d91185f1acb5dc20 Mon Sep 17 00:00:00 2001 From: KATOH Yasufumi Date: Sun, 12 May 2019 23:43:33 +0900 Subject: [PATCH 1/2] Add Japanese release announcement of LXD 3.13 Only translate "Introduction" and "New features" sections. Signed-off-by: KATOH Yasufumi --- content/lxd/news.ja/lxd-3.13.yaml | 435 ++++++++++++++++++++++++++++++ 1 file changed, 435 insertions(+) create mode 100644 content/lxd/news.ja/lxd-3.13.yaml diff --git a/content/lxd/news.ja/lxd-3.13.yaml b/content/lxd/news.ja/lxd-3.13.yaml new file mode 100644 index 0000000..dc87f92 --- /dev/null +++ b/content/lxd/news.ja/lxd-3.13.yaml @@ -0,0 +1,435 @@ +title: LXD 3.13 リリースのお知らせ +date: 2019/05/09 03:05 +origin: https://discuss.linuxcontainers.org/t/lxd-3-13-has-been-released/4738 +content: |- + ### はじめに + + LXD チームは、LXD 3.13 のリリースをお知らせすることにとてもワクワクしています! + + + これまでのリリースと同様に、このリリースもまたとてもエキサイティングな LXD のリリースです。便利な機能、多数のバグフィックスと改良が含まれています! + + + LXD チームに最近加わった @tomp が忙しく作業した結果、このリリースに新機能とバグ修正を組み込むことができ、LXD のネットワーク体験を向上させることができました。 + + + このリリースで、システムコールのインターセプションに必要な実装がすべて完了し、今ではサポートされているシステム上では `mknod` を処理しています。 + + + このリリースでは、スケーリングの改善、リーダーの負荷の軽減、コンテナのコピーとマイグレーションの改良が、特に CEPH クラスターで行われ、クラスターユーザーもこのリリースを楽しめるでしょう。 + + + エンタープライズユーザーは、外部の Canonical RBAC サービスを通して、ロールベースアクセス制御(Role Based Acess Control)を追加すると良いでしょう。LXD サーバーでプロジェクトごとのパーミッションを制御し、ユーザーとグループに役割(ロール)を割り当てることができます。 + + + そして、最近のカーネルにファイルシステムプロジェクトのクオータが追加されたおかげで、ついに `dir` ストレージバックエンドにクオータが使えるようになりました。 + + Enjoy! + + ### 新機能 + #### クラスター: ハートビート間隔の改良 + + LXD クラスターでは、現在のリーダーが定期的に全クラスターメンバーにハートビートを送信します。この主な目的はオフラインのクラスターメンバーを検出し、データーベースにオフラインであることを記録し、クエリーがオフラインのメンバーにブロックされないようにすることです。ハートビートのふたつめの目的は、データベースノードのリストをリフレッシュすることです。 + + + 以前は、これは 4 秒ごとに、すべてのクラスターメンバーが同時にアクセスを行っていましたが、その結果、特に現在クラスターリーダーとなっているホスト上で CPU とネットワークトラフィックのスパイクを生んでいました。 + + + LXD 3.13 では、この間隔を 10 秒に増加させました。そして、すべてのクラスターメンバーが同時にアクセスしないように、ハートビートのタイミングにランダム化を追加しています。ハートビート実行中に追加されたクラスターメンバーを検出するためのロジックも追加されました。 + + #### クラスターでの内部的なコンテナコピー + + LXD 3.13 では、通常スタンドアローンの LXD インスタンス上でコンテナのコピーを行うのと同様に、適切にワンステップでコンテナのコピーが行われるように実装されました。以前は、クライアントは(同じクラスターメンバーに留まる場合)コピーを実行するのか、(他のクラスターメンバーに移動する場合)マイグレーションを実行するのかを知っている必要がありましたが、これはすべて内部的に行われるようになりました。 + + + この修正による思わぬ効果として、クラスタ上のすべての CEPH コピーがほぼ瞬時に行われるようになりました。これはマイグレーションを行う必要がまったくないためです。 + + #### システムコールインターセプションのサポートの初期実装 + + LXD 3.13 を 5.0 以上のカーネル、最新の libseccomp、liblxc と組み合わせると、ユーザー空間でシステムコールをインターセプトして仲介できるようになりました。 + + + この機能を使って最初に `mknod` に焦点を当て、特権のないコンテナで作ることができるデバイスの基本的な許可リストを実装しました。 + + + libseccomp と liblxc の両方で開発元でのリリースが必要であり、カーネル内でも機能の更なる改良を待っているため、この機能が一般的に使われるまでには少し時間が必要でしょう。 + + + 将来的には、特定のファイルシステムを非特権コンテナ内でマウントしたり、カーネルモジュールをロードしたりするような機能ができるように、この機能を使って実装する予定です(すべて管理者のオプトインが必要になる予定)。 + + #### ロールベースアクセス制御(RBAC) + + Canonical RBAC サービスのユーザーは、LXD との統合が行えます。 + + + LXD は RBAC にすべてのプロジェクトを登録します。管理者は特定のプロジェクトまたは LXD インスタンス全体のユーザー・グループに役割(role)を割り当てることができます。 + + + 現時点では、次のパーミッションを含みます: + + - LXD への完全な管理アクセス + - コンテナの管理(作成、削除、再設定、…) + - コンテナの操作(起動/停止/再起動、実行(exec)、コンソール、…) + - イメージの管理(作成、削除、エイリアス、…) + - プロファイルの管理(作成、削除、再設定、…) + - プロジェクト自身の管理(再設定) + - 読み取り専用アクセス(プロジェクトに関するすべての表示) + + + これにより、非特権ユーザーが特権を昇格させることなくクラスター上でコンテナを実行できる、共有 LXD クラスターの実行に一歩近づくことができます。 + + #### IPVLAN サポート + + LXD で、LXC に最近実装された `ipvlan` が使えるようになりました。 + 機能をサポートする最近のバージョンの LXC を実行している場合、LXD の `nic` デバイスで IPVLAN を設定できます。 + + - `nictype` プロパティへの `ipvlan` の設定 + - 期待する出力デバイスへの `parent` プロパティの設定 + - IPv4 では、`ipv4.address` を設定したいアドレスに設定 + - IPv6 では、`ipv6.address` を設定したいアドレスに設定 + + + 実際の動作例を示します: + + stgraber at castiana:~$ lxc init ubuntu:18.04 ipvlan + Creating ipvlan + stgraber at castiana:~$ lxc config device add ipvlan eth0 nic nictype=ipvlan parent=wlan0 ipv4.address=172.17.0.100 ipv6.address=2001:470:b0f8:1000:1::100 + Device eth0 added to ipvlan + stgraber at castiana:~$ lxc start ipvlan + stgraber at castiana:~$ lxc exec ipvlan bash + root at ipvlan:~# ifconfig + eth0: flags=4291 mtu 1500 + inet 172.17.0.100 netmask 255.255.255.255 broadcast 255.255.255.255 + inet6 2001:470:b0f8:1000:1::100 prefixlen 128 scopeid 0x0 + inet6 fe80::28:f800:12b:bdf8 prefixlen 64 scopeid 0x20 + ether 00:28:f8:2b:bd:f8 txqueuelen 1000 (Ethernet) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 5 overruns 0 carrier 0 collisions 0 + + lo: flags=73 mtu 65536 + inet 127.0.0.1 netmask 255.0.0.0 + inet6 ::1 prefixlen 128 scopeid 0x10 + loop txqueuelen 1000 (Local Loopback) + RX packets 0 bytes 0 (0.0 B) + RX errors 0 dropped 0 overruns 0 frame 0 + TX packets 0 bytes 0 (0.0 B) + TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 + + root at ipvlan:~# ip -4 route show + default dev eth0 + + root at ipvlan:~# ip -6 route show + 2001:470:b0f8:1000:1::100 dev eth0 proto kernel metric 256 pref medium + fe80::/64 dev eth0 proto kernel metric 256 pref medium + default dev eth0 metric 1024 pref medium + + root at ipvlan:~# ping 8.8.8.8 + PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. + 64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=14.4 ms + --- 8.8.8.8 ping statistics --- + 1 packets transmitted, 1 received, 0% packet loss, time 0ms + rtt min/avg/max/mdev = 14.476/14.476/14.476/0.000 ms + + root at ipvlan:~# ping6 -n 2607:f8b0:400b:800::2004 + PING 2607:f8b0:400b:800::2004(2607:f8b0:400b:800::2004) 56 data bytes + 64 bytes from 2607:f8b0:400b:800::2004: icmp_seq=1 ttl=57 time=21.2 ms + --- 2607:f8b0:400b:800::2004 ping statistics --- + 1 packets transmitted, 1 received, 0% packet loss, time 0ms + rtt min/avg/max/mdev = 21.245/21.245/21.245/0.000 ms + root at ipvlan:~# + + #### `dir` ストレージバックエンドでのクォータ(quota)サポート + + `project quota` 機能が、最近 Linux カーネルに追加されました。 + + + `dir` タイプのストレージプールに対するファイルシステムが適切に設定されている場合、他のストレージバックエンドの場合と同様にコンテナのクォータが設定できるようになり、ディスク使用量も適切に報告されるようになりました。 + + stgraber at castiana:~$ sudo truncate -s 10G /tmp/ext4.img + stgraber at castiana:~$ sudo mkfs.ext4 /tmp/ext4.img + mke2fs 1.44.6 (5-Mar-2019) + Discarding device blocks: done + Creating filesystem with 2621440 4k blocks and 655360 inodes + Filesystem UUID: d8ab56d9-1e84-40ee-921a-c68c06ad6625 + Superblock backups stored on blocks: + 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632 + + Allocating group tables: done + Writing inode tables: done + Creating journal (16384 blocks): done + Writing superblocks and filesystem accounting information: done + stgraber at castiana:~$ sudo tune2fs -O project -Q prjquota /tmp/ext4.img + tune2fs 1.44.6 (5-Mar-2019) + + stgraber at castiana:~$ sudo mount -o prjquota /tmp/ext4.img /mnt/ + stgraber at castiana:~$ sudo rmdir /mnt/lost+found/ + stgraber at castiana:~$ lxc storage create mnt dir source=/mnt + Storage pool mnt created + + stgraber at castiana:~$ lxc launch ubuntu:18.04 c1 -s mnt + Creating c1 + Starting c1 + stgraber at castiana:~$ lxc exec c1 -- df -h + Filesystem Size Used Avail Use% Mounted on + /var/lib/lxd/storage-pools/mnt/containers/c1/rootfs 9.8G 742M 8.6G 8% / + none 492K 0 492K 0% /dev + udev 7.7G 0 7.7G 0% /dev/tty + tmpfs 100K 0 100K 0% /dev/lxd + tmpfs 100K 0 100K 0% /dev/.lxd-mounts + tmpfs 7.8G 0 7.8G 0% /dev/shm + tmpfs 7.8G 152K 7.8G 1% /run + tmpfs 5.0M 0 5.0M 0% /run/lock + tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup + + stgraber at castiana:~$ lxc config device set c1 root size 1GB + stgraber at castiana:~$ lxc exec c1 -- df -h + Filesystem Size Used Avail Use% Mounted on + /var/lib/lxd/storage-pools/mnt/containers/c1/rootfs 954M 706M 249M 74% / + none 492K 0 492K 0% /dev + udev 7.7G 0 7.7G 0% /dev/tty + tmpfs 100K 0 100K 0% /dev/lxd + tmpfs 100K 0 100K 0% /dev/.lxd-mounts + tmpfs 7.8G 0 7.8G 0% /dev/shm + tmpfs 7.8G 152K 7.8G 1% /run + tmpfs 5.0M 0 5.0M 0% /run/lock + tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup + + stgraber at castiana:~$ lxc info c1 + Name: c1 + Location: none + Remote: unix:// + Architecture: x86_64 + Created: 2019/05/09 16:09 UTC + Status: Running + Type: persistent + Profiles: default + Pid: 10096 + Ips: + eth0: inet 10.166.11.38 vethKM0DFY + eth0: inet6 2001:470:b368:4242:216:3eff:fe4b:2c3 vethKM0DFY + eth0: inet6 fe80::216:3eff:fe4b:2c3 vethKM0DFY + lo: inet 127.0.0.1 + lo: inet6 ::1 + Resources: + Processes: 24 + Disk usage: + root: 739.77MB + CPU usage: + CPU usage (in seconds): 7 + Memory usage: + Memory (current): 104.91MB + Memory (peak): 229.67MB + Network usage: + lo: + Bytes received: 1.23kB + Bytes sent: 1.23kB + Packets received: 12 + Packets sent: 12 + eth0: + Bytes received: 480.35kB + Bytes sent: 27.21kB + Packets received: 332 + Packets sent: 277 + + #### コンテナの NIC デバイスでのルーティング + + `nic` デバイス上で、新たに追加された `ipv4.routes` と `ipv6.routes` オプションを設定することで、特定のルートを特定のコンテナに結び付けられるようになります。ホスト間でコンテナが移動する際もコンテナに追従させられます。 + + + これは通常、ネットワーク自身で同じような名前のキーを使うよりも良いオプションです。 + + #### NAT ソースアドレスが設定可能に + + LXD ネットワーク上で、新たに追加された `ipv4.nat.address` と `ipv6.nat.address` プロパティを設定することで、特定のブリッジに対する発信 IP アドレスを上書きできます。 + + #### API 経由での LXC 機能のエクスポート + + 以前のリリースで、カーネルの機能に関して行ったのと同様に、LXD が使える特定の LXC の機能を LXD API を使ってエクスポートできるようになりました。これにより、クライアントはターゲットで期待する高度な機能を確認できるようになります。 + + lxc_features: + mount_injection_file: "true" + network_gateway_device_route: "true" + network_ipvlan: "true" + network_l2proxy: "true" + seccomp_notify: "true" + + + ### Bugs fixed + - client: Consider volumeOnly option when migrating + - client: Copy volume config and description + - client: Don't crash on missing stdin + - client: Fix copy from snapshot + - client: Fix copying between two unix sockets + - doc: Adds missing packages to install guide + - doc: Correct host_name property + - doc: Update storage documentation + - i18n: Update translations from weblate + - lxc/copy: Don't strip volatile keys on refresh + - lxc/utils: Updates progress to stop outputting if msg is longer than window + - lxd/api: Rename alias* commands to imageAlias* + - lxd/api: Rename apiProject* to project* + - lxd/api: Rename certificateFingerprint* to certficate* + - lxd/api: Rename operation functions for consistency + - lxd/api: Rename serverResources to api10Resources + - lxd/api: Rename snapshotHandler to containerSnapshotHandler + - lxd/api: Replace Command with APIEndpoint + - lxd/api: Sort API commands list + - lxd/candid: Cleanup config handling + - lxd/certificates: Make certificate add more robust + - lxd/certificates: Port to APIEndpoint + - lxd/cluster: Avoid panic in Gateway + - lxd/cluster: Fix race condition during join + - lxd/cluster: Port to APIEndpoint + - lxd/cluster: Use current time for hearbeat + - lxd/cluster: Workaround new raft logging + - lxd/containers: Avoid costly storage calls during snapshot + - lxd/containers: Change disable_ipv6=1 to accept_ra=0 on host side interface + - lxd/containers: Don't fail on old libseccomp + - lxd/containers: Don't needlessly mount snapshots + - lxd/containers: Early check for running container refresh + - lxd/containers: Fix bad operation type + - lxd/containers: Fix profile snapshot settings + - lxd/containers: Moves network limits to network up hook + - lxd/containers: Only run network up hook for nics that need it + - lxd/containers: Optimize snapshot retrieval + - lxd/containers: Port to APIEndpoint + - lxd/containers: Remove unused arg from network limits function + - lxd/containers: Speed up simple snapshot list + - lxd/daemon: Port to APIEndpoint + - lxd: Don't allow remote access to internal API + - lxd: Fix volume migration with snapshots + - lxd: Have Authenticate return the protocol + - lxd: More reliably grab interface host name + - lxd: Port from HasApiExtension to LXCFeatures + - lxd: Rename parseAddr to proxyParseAddr + - lxd: Use idmap.Equals + - lxd/db: Fix substr handling for containers + - lxd/db: Parent filter for ContainerList + - lxd/db/profiles: Fix cross-project updates + - lxd/db: Properly handle unsetting keys + - lxd/event: Port to APIEndpoint + - lxd/images: Fix project handling on copy + - lxd/images: Fix simplestreams cache expiry + - lxd/images: Port to APIEndpoint + - lxd/images: Properly handle invalid protocols + - lxd/images: Replicate images to the right project + - lxd/internal: Port to APIEndpoint + - lxd/migration: Fix feature negotiation + - lxd/network: Filter leases by project + - lxd/network: Fix DNS records for projects + - lxd/network: Port to APIEndpoint + - lxd/operation: Port to APIEndpoint + - lxd/patches: Fix LVM VG name + - lxd/profiles: Optimize container updates + - lxd/profiles: Port to APIEndpoint + - lxd/projects: Port to APIEndpoint + - lxd/proxy: Correctly handle unix: path rewriting with empty bind= + - lxd/proxy: Don't wrap string literal + - lxd/proxy: Fix goroutine leak + - lxd/proxy: Handle mnts for abstract unix sockets + - lxd/proxy: Make helpers static + - lxd/proxy: Make logfile close on exec + - lxd/proxy: Only attach to mntns for unix sockets + - lxd/proxy: Retry epoll on EINTR + - lxd/proxy: Use standard macros on exit + - lxd/proxy: Validate the addresses + - lxd/resource: Port to APIEndpoint + - lxd/storage: Don't hardcode default project + - lxd/storage: Fix error message on differing maps + - lxd/storage: Handle XFS with leftover journal entries + - lxd/storage: Port to APIEndpoint + - lxd/storage/btrfs: Don't make ro snapshots when unpriv + - lxd/storage/ceph: Don't mix stderr with json + - lxd/storage/ceph: Fix snapshot of running containers + - lxd/storage/ceph: Fix snapshot of running xfs/btrfs + - lxd/storage/ceph: Fix UUID re-generation + - lxd/storage/ceph: Only rewrite UUID once + - lxd/sys: Cleanup State struct + - scripts/bash: Add bash completion for profile/container device get, set, unset + - shared: Add StringMapHasStringKey helper function + - shared: Fix $SNAP handling under new snappy + - shared: Fix Windows build + - shared/idmap: Add comparison function + - shared/netutils: Adapt to kernel changes + - shared/netutils: Add AbstractUnixReceiveFdData() + - shared/netutils: Export peer link id in getifaddrs + - shared/netutils: Handle SCM_CREDENTIALS when receiving fds + - shared/netutils: Move network cgo to shared/netutils + - shared/netutils: Move send/recv fd functions + - shared/network: Fix reporting of down interfaces + - shared/network: Get HostName field when possible + - shared/osarch: Add i586 to arch aliases + - tests: Extend migration tests + - tests: Handle built-in shiftfs + - tests: Updates config tests to use host_name for nic tests + + ### 試用環境 + + この新しい LXD リリースは私たちの [デモサービス](https://linuxcontainers.org/ja/lxd/try-it/) で利用できます。 + + ### ダウンロード + + このリリースの tarball は [ダウンロードページ](/lxd/downloads/) から取得できます。 From 10d7899a1af95c7a87f48ce4fd62e2a626d616b1 Mon Sep 17 00:00:00 2001 From: KATOH Yasufumi Date: Mon, 13 May 2019 23:27:27 +0900 Subject: [PATCH 2/2] Improve Japanese translation Reviewed-by: Hiroaki Nakamura Signed-off-by: KATOH Yasufumi --- content/lxd/news.ja/lxd-3.13.yaml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/lxd/news.ja/lxd-3.13.yaml b/content/lxd/news.ja/lxd-3.13.yaml index dc87f92..2a53dcd 100644 --- a/content/lxd/news.ja/lxd-3.13.yaml +++ b/content/lxd/news.ja/lxd-3.13.yaml @@ -31,7 +31,7 @@ content: |- - エンタープライズユーザーは、外部の Canonical RBAC サービスを通して、ロールベースアクセス制御(Role Based Acess Control)を追加すると良いでしょう。LXD サーバーでプロジェクトごとのパーミッションを制御し、ユーザーとグループに役割(ロール)を割り当てることができます。 + エンタープライズユーザーは、外部の Canonical RBAC サービスを用いたロールベースアクセス制御(Role Based Acess Control)が追加されたのを気に入るでしょう。LXD サーバーでプロジェクトごとのパーミッションを制御し、ユーザーとグループに役割(ロール)を割り当てることができます。 - この修正による思わぬ効果として、クラスタ上のすべての CEPH コピーがほぼ瞬時に行われるようになりました。これはマイグレーションを行う必要がまったくないためです。 + この修正による副次的な効果として、クラスタ上のすべての CEPH コピーがほぼ瞬時に行われるようになりました。これはマイグレーションを行う必要がまったくないためです。 #### システムコールインターセプションのサポートの初期実装 - `project quota` 機能が、最近 Linux カーネルに追加されました。 + 最近の Linux カーネルに追加された `project quota` 機能をサポートするようになりました。 ) + +最近では、snap を使って LXD をインストールするのがオススメの方法です。 + + +最新の stable リリースの場合は次のように実行します: + + snap install lxd + + +LXD 3.0 stable リリースの場合は次のように実行します: + + snap install lxd --channel=3.0/stable + + +LXD 2.0 stable リリースの場合は次のように実行します: + + snap install lxd --channel=2.0/stable + + +以前、LXD の deb パッケージをインストールしていた場合は、次のような方法で既存データをすべて移行できます: + + lxd.migrate + +#### Ubuntu 14.04 LTS (LXD 2.0 deb) @@ -67,7 +102,7 @@ LXD の LTS ブランチをインストールするには、以下を実行し apt install -t trusty-backports lxd lxd-client -### Ubuntu 16.04 LTS +#### Ubuntu 16.04 LTS (LXD 3.0 deb) @@ -105,6 +140,11 @@ After that, you can install LXD with: snap install lxd + +あるいは、LXD 3.0 LTS リリースをインストールするために `--channel=3.0/stable` を、LXD 2.0 LTS リリースをインストールするために `--channel=2.0/stable` を指定できます。 + ### MacOS 用クライアント Sign in Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network: Fixes bug that stopped down hook from running for phys netdevs #3006

Open
wants to merge 1 commit into
base: master
from

Conversation

1 participant
@tomponline
Copy link
Member

commented May 15, 2019

Signed-off-by: Thomas Parrott thomas.parrott at canonical.com

network: Fixes bug that stopped down hook from running for phys netdevs
Signed-off-by: Thomas Parrott <thomas.parrott at canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
From noreply at github.com Thu May 16 08:11:45 2019 From: noreply at github.com (Christian Brauner) Date: Thu, 16 May 2019 01:11:45 -0700 Subject: [lxc-devel] [lxc/lxc] b3259d: network: Fixes bug that stopped down hook from run... Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: b3259dc669b323d201036a56c29c16a459da67b2 https://github.com/lxc/lxc/commit/b3259dc669b323d201036a56c29c16a459da67b2 Author: Thomas Parrott Date: 2019-05-15 (Wed, 15 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Fixes bug that stopped down hook from running for phys netdevs Signed-off-by: Thomas Parrott Commit: 6ae34d21696c25de0264ce60a1641011cd17f20d https://github.com/lxc/lxc/commit/6ae34d21696c25de0264ce60a1641011cd17f20d Author: Christian Brauner Date: 2019-05-16 (Thu, 16 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- Merge pull request #3006 from tomponline/tp-phys-downhook network: Fixes bug that stopped down hook from running for phys netdevs Compare: https://github.com/lxc/lxc/compare/e2f2d86a4199...6ae34d21696c From lxc-bot at linuxcontainers.org Thu May 16 11:10:38 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Thu, 16 May 2019 04:10:38 -0700 (PDT) Subject: [lxc-devel] [lxd/master] container_backup: Fixes crash when renaming non-existent backup Message-ID: <5cdd452e.1c69fb81.49f8e.7827SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 376 bytes Desc: not available URL: -------------- next part -------------- From 503235a98c85ae8080331d56f094b6f8c6e52a4a Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Thu, 16 May 2019 12:08:37 +0100 Subject: [PATCH] container_backup: Fixes crash when renaming non-existent backup Fixes #5762 Signed-off-by: Thomas Parrott --- lxd/container_backup.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lxd/container_backup.go b/lxd/container_backup.go index e50b4d2792..6425becae8 100644 --- a/lxd/container_backup.go +++ b/lxd/container_backup.go @@ -226,7 +226,7 @@ func containerBackupPost(d *Daemon, r *http.Request) Response { oldName := name + shared.SnapshotDelimiter + backupName backup, err := backupLoadByName(d.State(), project, oldName) if err != nil { - SmartError(err) + return SmartError(err) } newName := name + shared.SnapshotDelimiter + req.Name From lxc-bot at linuxcontainers.org Thu May 16 13:34:42 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Thu, 16 May 2019 06:34:42 -0700 (PDT) Subject: [lxc-devel] [lxc/master] attach: do not reload container Message-ID: <5cdd66f2.1c69fb81.a022d.11f2SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 475 bytes Desc: not available URL: -------------- next part -------------- From 78b44e419b009d84500361c3c9ac294cd32d2a5e Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Thu, 16 May 2019 15:29:41 +0200 Subject: [PATCH] attach: do not reload container Let lxc_attach() reuse the already initialized container. Closes https://github.com/lxc/lxd/issues/5755. Signed-off-by: Christian Brauner --- src/lxc/attach.c | 26 +++++++++++++++++--------- src/lxc/attach.h | 2 +- src/lxc/lxccontainer.c | 4 ++-- 3 files changed, 20 insertions(+), 12 deletions(-) diff --git a/src/lxc/attach.c b/src/lxc/attach.c index ce51352c67..c91ff751f4 100644 --- a/src/lxc/attach.c +++ b/src/lxc/attach.c @@ -1008,9 +1008,9 @@ static inline void lxc_attach_terminal_close_log(struct lxc_terminal *terminal) close_prot_errno_disarm(terminal->log_fd); } -int lxc_attach(const char *name, const char *lxcpath, - lxc_attach_exec_t exec_function, void *exec_payload, - lxc_attach_options_t *options, pid_t *attached_process) +int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function, + void *exec_payload, lxc_attach_options_t *options, + pid_t *attached_process) { int i, ret, status; int ipc_sockets[2]; @@ -1020,6 +1020,7 @@ int lxc_attach(const char *name, const char *lxcpath, struct lxc_proc_context_info *init_ctx; struct lxc_terminal terminal; struct lxc_conf *conf; + char *name, *lxcpath; struct attach_clone_payload payload = {0}; ret = access("/proc/self/ns", X_OK); @@ -1028,21 +1029,34 @@ int lxc_attach(const char *name, const char *lxcpath, return -1; } + if (!container) + return minus_one_set_errno(EINVAL); + + if (lxc_container_get(container)) + return minus_one_set_errno(EINVAL); + + name = container->name; + lxcpath = container->config_path; + if (!options) options = &attach_static_default_options; init_pid = lxc_cmd_get_init_pid(name, lxcpath); if (init_pid < 0) { ERROR("Failed to get init pid"); + lxc_container_put(container); return -1; } init_ctx = lxc_proc_get_context_info(init_pid); if (!init_ctx) { ERROR("Failed to get context of init process: %ld", (long)init_pid); + lxc_container_put(container); return -1; } + init_ctx->container = container; + personality = get_personality(name, lxcpath); if (init_ctx->personality < 0) { ERROR("Failed to get personality of the container"); @@ -1051,12 +1065,6 @@ int lxc_attach(const char *name, const char *lxcpath, } init_ctx->personality = personality; - init_ctx->container = lxc_container_new(name, lxcpath); - if (!init_ctx->container) { - lxc_proc_put_context_info(init_ctx); - return -1; - } - if (!init_ctx->container->lxc_conf) { init_ctx->container->lxc_conf = lxc_conf_init(); if (!init_ctx->container->lxc_conf) { diff --git a/src/lxc/attach.h b/src/lxc/attach.h index 4bf9578ee9..c576aa9fca 100644 --- a/src/lxc/attach.h +++ b/src/lxc/attach.h @@ -41,7 +41,7 @@ struct lxc_proc_context_info { int ns_fd[LXC_NS_MAX]; }; -extern int lxc_attach(const char *name, const char *lxcpath, +extern int lxc_attach(struct lxc_container *container, lxc_attach_exec_t exec_function, void *exec_payload, lxc_attach_options_t *options, pid_t *attached_process); diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c index cea8aa5d7b..e3262a4611 100644 --- a/src/lxc/lxccontainer.c +++ b/src/lxc/lxccontainer.c @@ -4064,7 +4064,7 @@ static int lxcapi_attach(struct lxc_container *c, lxc_attach_exec_t exec_functio current_config = c->lxc_conf; - ret = lxc_attach(c->name, c->config_path, exec_function, exec_payload, options, attached_process); + ret = lxc_attach(c, exec_function, exec_payload, options, attached_process); current_config = NULL; return ret; } @@ -4081,7 +4081,7 @@ static int do_lxcapi_attach_run_wait(struct lxc_container *c, lxc_attach_options command.program = (char*)program; command.argv = (char**)argv; - r = lxc_attach(c->name, c->config_path, lxc_attach_run_command, &command, options, &pid); + r = lxc_attach(c, lxc_attach_run_command, &command, options, &pid); if (r < 0) { ERROR("ups"); return r; From noreply at github.com Thu May 16 17:33:44 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Thu, 16 May 2019 10:33:44 -0700 Subject: [lxc-devel] [lxc/lxc] 908fbc: attach: do not reload container Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: 908fbc1a2e50a5bff8bb2714b4a0eff0cf0a2303 https://github.com/lxc/lxc/commit/908fbc1a2e50a5bff8bb2714b4a0eff0cf0a2303 Author: Christian Brauner Date: 2019-05-16 (Thu, 16 May 2019) Changed paths: M src/lxc/attach.c M src/lxc/attach.h M src/lxc/lxccontainer.c Log Message: ----------- attach: do not reload container Let lxc_attach() reuse the already initialized container. Closes https://github.com/lxc/lxd/issues/5755. Signed-off-by: Christian Brauner Commit: 07c5b72a11ac73c579d6c4a3de5098108db66a95 https://github.com/lxc/lxc/commit/07c5b72a11ac73c579d6c4a3de5098108db66a95 Author: Stéphane Graber Date: 2019-05-16 (Thu, 16 May 2019) Changed paths: M src/lxc/attach.c M src/lxc/attach.h M src/lxc/lxccontainer.c Log Message: ----------- Merge pull request #3009 from brauner/2019-05-16/rework_attach attach: do not reload container Compare: https://github.com/lxc/lxc/compare/6ae34d21696c...07c5b72a11ac From builds at travis-ci.org Thu May 16 17:35:30 2019 From: builds at travis-ci.org (Travis CI) Date: Thu, 16 May 2019 17:35:30 +0000 Subject: [lxc-devel] Errored: lxc/lxc#6836 (master - 07c5b72) In-Reply-To: Message-ID: <5cdd9f624bd40_43fe668a5b57049752@8f891199-bb37-4de0-b74d-0649b7ebb252.mail> Build Update for lxc/lxc ------------------------------------- Build: #6836 Status: Errored Duration: 1 min and 18 secs Commit: 07c5b72 (master) Author: Stéphane Graber Message: Merge pull request #3009 from brauner/2019-05-16/rework_attach attach: do not reload container View the changeset: https://github.com/lxc/lxc/compare/6ae34d21696c...07c5b72a11ac View the full build log and details: https://travis-ci.org/lxc/lxc/builds/533438279?utm_medium=notification&utm_source=email -- You can unsubscribe from build emails from the lxc/lxc repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=1693277&utm_medium=notification&utm_source=email. Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email. Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxc-bot at linuxcontainers.org Fri May 17 05:51:43 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Thu, 16 May 2019 22:51:43 -0700 (PDT) Subject: [lxc-devel] [lxc/master] lxccontainer: cleanup attach functions Message-ID: <5cde4bef.1c69fb81.e9813.249cSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 437 bytes Desc: not available URL: -------------- next part -------------- From d64301431737bcd54c2824dc95d49f4fb30d4844 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 17 May 2019 07:50:45 +0200 Subject: [PATCH] lxccontainer: cleanup attach functions Specifically, refloat function arguments and remove useless comments. Signed-off-by: Christian Brauner --- src/lxc/lxccontainer.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/src/lxc/lxccontainer.c b/src/lxc/lxccontainer.c index e3262a4611..9dbc6e022f 100644 --- a/src/lxc/lxccontainer.c +++ b/src/lxc/lxccontainer.c @@ -4055,7 +4055,9 @@ static bool do_lxcapi_rename(struct lxc_container *c, const char *newname) WRAP_API_1(bool, lxcapi_rename, const char *) -static int lxcapi_attach(struct lxc_container *c, lxc_attach_exec_t exec_function, void *exec_payload, lxc_attach_options_t *options, pid_t *attached_process) +static int lxcapi_attach(struct lxc_container *c, + lxc_attach_exec_t exec_function, void *exec_payload, + lxc_attach_options_t *options, pid_t *attached_process) { int ret; @@ -4064,33 +4066,37 @@ static int lxcapi_attach(struct lxc_container *c, lxc_attach_exec_t exec_functio current_config = c->lxc_conf; - ret = lxc_attach(c, exec_function, exec_payload, options, attached_process); + ret = lxc_attach(c, exec_function, exec_payload, options, + attached_process); current_config = NULL; return ret; } -static int do_lxcapi_attach_run_wait(struct lxc_container *c, lxc_attach_options_t *options, const char *program, const char * const argv[]) +static int do_lxcapi_attach_run_wait(struct lxc_container *c, + lxc_attach_options_t *options, + const char *program, + const char *const argv[]) { lxc_attach_command_t command; pid_t pid; - int r; + int ret; if (!c) return -1; - command.program = (char*)program; - command.argv = (char**)argv; + command.program = (char *)program; + command.argv = (char **)argv; - r = lxc_attach(c, lxc_attach_run_command, &command, options, &pid); - if (r < 0) { - ERROR("ups"); - return r; - } + ret = lxc_attach(c, lxc_attach_run_command, &command, options, &pid); + if (ret < 0) + return ret; return lxc_wait_for_pid_status(pid); } -static int lxcapi_attach_run_wait(struct lxc_container *c, lxc_attach_options_t *options, const char *program, const char * const argv[]) +static int lxcapi_attach_run_wait(struct lxc_container *c, + lxc_attach_options_t *options, + const char *program, const char *const argv[]) { int ret; From lxc-bot at linuxcontainers.org Fri May 17 07:06:27 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Fri, 17 May 2019 00:06:27 -0700 (PDT) Subject: [lxc-devel] [lxd/master] doc: Clarify API security and options to restrict Message-ID: <5cde5d73.1c69fb81.699c9.cdd3SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 383 bytes Desc: not available URL: -------------- next part -------------- From 009e8af16e8702f430899a234fa02e7197306770 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Fri, 17 May 2019 09:04:44 +0200 Subject: [PATCH] doc: Clarify API security and options to restrict MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Suggested-by: Chris Moberly Signed-off-by: Stéphane Graber --- doc/index.md | 15 ++++++++++++ doc/security.md | 63 +++++++++++++++++++++++++++++++++++++------------ 2 files changed, 63 insertions(+), 15 deletions(-) diff --git a/doc/index.md b/doc/index.md index 89e53d552d..e25e86baf9 100644 --- a/doc/index.md +++ b/doc/index.md @@ -101,6 +101,21 @@ group to talk to LXD; you can create your own group if you want): sudo -E LD_LIBRARY_PATH=$LD_LIBRARY_PATH $GOPATH/bin/lxd --group sudo ``` +## Security +LXD, similar to other similar container managers provides a UNIX socket for local communication. + +**WARNING**: Anyone with access to that socket can fully control LXD, which includes +the ability to attach host devices and filesystems, this should +therefore only be given to users who would be trusted with root access +to the host. + +When listening to the network, the same API is available on a TLS socket +(HTTPs), specific access on the remote API can be restricted through +Canonical RBAC. + + +More details are [available here](security.md). + ## Getting started with LXD Now that you have LXD running on your system you can read the [getting started guide](https://linuxcontainers.org/lxd/getting-started-cli/) or go through more examples and configurations in [our documentation](https://lxd.readthedocs.org). diff --git a/doc/security.md b/doc/security.md index 4070bb9f01..c4c3180843 100644 --- a/doc/security.md +++ b/doc/security.md @@ -1,16 +1,30 @@ # Security ## Introduction -Local communications over the UNIX socket happen over a cleartext HTTP -socket and access is restricted by socket ownership and mode. +LXD is a daemon running as root. +Access to that daemon is only possible over a local UNIX socket by default. +Through configuration, it's then possible to expose the same API over +the network on a TLS socket. + +**WARNING**: Local access to LXD through the UNIX socket always grants +full access to LXD. This includes the ability to attach any filesystem +paths or devices to any container as well as tweaking all security +features on containers. You should only give such access to someone who +you'd trust with root access to your system. + +The remote API uses either TLS client certificates or Candid based +authentication. Canonical RBAC support can be used combined with Candid +based authentication to limit what an API client may do on LXD. + +## TLS configuration Remote communications with the LXD daemon happen using JSON over HTTPS. The supported protocol must be TLS1.2 or better. All communications must use perfect forward secrecy and ciphers must be limited to strong elliptic curve ones (such as ECDHE-RSA or ECDHE-ECDSA). -Any generated key should be at least 4096bit RSA and when using -signatures, only SHA-2 signatures should be trusted. +Any generated key should be at least 4096bit RSA, preferably EC384 and +when using signatures, only SHA-2 signatures should be trusted. Since we control both client and server, there is no reason to support any backward compatibility to broken protocol or ciphers. @@ -23,7 +37,27 @@ certificate for any client-server communication. To cause certificates to be regenerated, simply remove the old ones. On the next connection a new certificate will be generated. -## Adding a remote with a default setup +## Role Based Access Control (RBAC) +LXD supports integrating with the Canonical RBAC service. + +This uses Candid based authentication with the RBAC service maintaining +roles to user/group relationships. Those roles can be scoped down to LXD +projects. + +By default, the 4 roles give you: + + - auditor: Read-only access to the project + - user: Ability to do normal lifecycle actions (start, stop, ...), + execute commands in the containers, attach to console, manage snapshots, ... + - operator: All of the above + the ability to create, re-configure and + delete containers and images + - admin: All of the above + the ability to reconfigure the project itself + +**WARNING**: Of those roles, only `auditor` and `user` are currently +suitable for a user whom you wouldn't trust with root access to the +host. + +## Adding a remote with TLS client certificate authentication In the default setup, when the user adds a new server with `lxc remote add`, the server will be contacted over HTTPs, its certificate downloaded and the fingerprint will be shown to the user. @@ -41,7 +75,7 @@ any additional credentials. This is a workflow that's very similar to that of SSH where an initial connection to an unknown server triggers a prompt. -## Adding a remote with a PKI based setup +## Adding a remote with a TLS client in a PKI based setup In the PKI setup, a system administrator is managing a central PKI, that PKI then issues client certificates for all the lxc clients and server certificates for all the LXD daemons. @@ -73,7 +107,7 @@ pre-generated files. After this is done, restarting the server will have it run in PKI mode. -## Adding a remote with Candid +## Adding a remote with Candid authentication When LXD is configured with Candid, it will request that clients trying to authenticating with it get a Discharge token from the authentication server specified by the `candid.api.url` setting. @@ -88,9 +122,9 @@ presenting the token received from the authentication server. The LXD server verifies the token, thus authenticating the request. The token is stored as cookie and is presented by the client at each request to LXD. -## Managing trusted clients -The list of certificates trusted by a LXD server can be obtained with `lxc -config trust list`. +## Managing trusted TLS clients +The list of TLS certificates trusted by a LXD server can be obtained with +`lxc config trust list`. Clients can manually be added using `lxc config trust add `, removing the need for a shared trust password by letting an existing @@ -99,9 +133,10 @@ administrator add the new client certificate directly to the trust store. To revoke trust to a client its certificate can be removed with `lxc config trust remove FINGERPRINT`. -## Password prompt -To establish a new trust relationship, a password must be set on the -server and send by the client when adding itself. +## Password prompt with TLS authentication +To establish a new trust relationship when not already setup by the +administrator, a password must be set on the server and sent by the +client when adding itself. A remote add operation should therefore go like this: @@ -129,7 +164,6 @@ if the certificate did in fact change. If it did, then the certificate can be replaced by the new one or the remote be removed altogether and re-added. - ### Server trust relationship revoked In this case, the server still uses the same certificate but all API calls return a 403 with an error indicating that the client isn't @@ -138,7 +172,6 @@ trusted. This happens if another trusted client or the local server administrator removed the trust entry on the server. - ## Production setup For production setup, it's recommended that `core.trust_password` is unset after all clients have been added. This prevents brute-force attacks trying to From noreply at github.com Fri May 17 07:10:49 2019 From: noreply at github.com (=?UTF-8?B?U3TDqXBoYW5lIEdyYWJlcg==?=) Date: Fri, 17 May 2019 00:10:49 -0700 Subject: [lxc-devel] [lxc/lxc] d64301: lxccontainer: cleanup attach functions Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: d64301431737bcd54c2824dc95d49f4fb30d4844 https://github.com/lxc/lxc/commit/d64301431737bcd54c2824dc95d49f4fb30d4844 Author: Christian Brauner Date: 2019-05-17 (Fri, 17 May 2019) Changed paths: M src/lxc/lxccontainer.c Log Message: ----------- lxccontainer: cleanup attach functions Specifically, refloat function arguments and remove useless comments. Signed-off-by: Christian Brauner Commit: ddf4b77e11a4d08f09b7b9cd13e593f8c047edc5 https://github.com/lxc/lxc/commit/ddf4b77e11a4d08f09b7b9cd13e593f8c047edc5 Author: Stéphane Graber Date: 2019-05-17 (Fri, 17 May 2019) Changed paths: M src/lxc/lxccontainer.c Log Message: ----------- Merge pull request #3010 from brauner/2019-05-17/bugfixes lxccontainer: cleanup attach functions Compare: https://github.com/lxc/lxc/compare/07c5b72a11ac...ddf4b77e11a4 From lxc-bot at linuxcontainers.org Fri May 17 08:09:15 2019 From: lxc-bot at linuxcontainers.org (tenforward on Github) Date: Fri, 17 May 2019 01:09:15 -0700 (PDT) Subject: [lxc-devel] [lxd/master] doc: Fix typo in networks.md Message-ID: <5cde6c2b.1c69fb81.d84d3.7be0SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 355 bytes Desc: not available URL: -------------- next part -------------- From 70780f22b1be8372bbf7922b48f8af9a7d625604 Mon Sep 17 00:00:00 2001 From: KATOH Yasufumi Date: Fri, 17 May 2019 17:08:34 +0900 Subject: [PATCH] doc: Fix typo in networks.md Signed-off-by: KATOH Yasufumi --- doc/networks.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/networks.md b/doc/networks.md index 69fa409eb7..0e971d35b2 100644 --- a/doc/networks.md +++ b/doc/networks.md @@ -50,7 +50,7 @@ ipv6.dhcp.stateful | boolean | ipv6 dhcp | false ipv6.firewall | boolean | ipv6 address | true | Whether to generate filtering firewall rules for this network ipv6.nat | boolean | ipv6 address | false | Whether to NAT (will default to true if unset and a random ipv6.address is generated) ipv6.nat.order | string | ipv6 address | before | Whether to add the required NAT rules before or after any pre-existing rules -ipv4.nat.address | string | ipv6 address | - | The source address used for outbound traffic from the bridge +ipv6.nat.address | string | ipv6 address | - | The source address used for outbound traffic from the bridge ipv6.routes | string | ipv6 address | - | Comma separated list of additional IPv6 CIDR subnets to route to the bridge ipv6.routing | boolean | ipv6 address | true | Whether to route traffic in and out of the bridge raw.dnsmasq | string | - | - | Additional dnsmasq configuration to append to the configuration From lxc-bot at linuxcontainers.org Fri May 17 08:25:32 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Fri, 17 May 2019 01:25:32 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxc/container: Fixes minute rollover issue in scheduled snapshots Message-ID: <5cde6ffc.1c69fb81.f8ace.8f4aSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 376 bytes Desc: not available URL: -------------- next part -------------- From 71fe696be005e2e96363f46129ace145d60a7911 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Fri, 17 May 2019 09:24:21 +0100 Subject: [PATCH] lxc/container: Fixes minute rollover issue in scheduled snapshots Fixes #5761 Signed-off-by: Thomas Parrott --- lxd/container.go | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/lxd/container.go b/lxd/container.go index 7295d06917..e935dde899 100644 --- a/lxd/container.go +++ b/lxd/container.go @@ -1635,10 +1635,18 @@ func autoCreateContainerSnapshotsTask(d *Daemon) (task.Func, task.Schedule) { // Check if it's time to snapshot now := time.Now() + + // Truncate the time now back to the start of the minute, before passing to + // the cron scheduler, as it will add 1s to the scheduled time and we don't + // want the next scheduled time to roll over to the next minute and break + // the time comparison below. + now = now.Truncate(time.Minute) + + // Calculate the next scheduled time based on the snapshots.schedule + // pattern and the time now. next := sched.Next(now) // Ignore everything that is more precise than minutes. - now = now.Truncate(time.Minute) next = next.Truncate(time.Minute) if !now.Equal(next) { From lxc-bot at linuxcontainers.org Fri May 17 08:48:19 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Fri, 17 May 2019 01:48:19 -0700 (PDT) Subject: [lxc-devel] [lxd/master] doc: Removes trailing whitespace Message-ID: <5cde7553.1c69fb81.8e499.cc20SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 361 bytes Desc: not available URL: -------------- next part -------------- From 61b682c1d3472c1e5981f1273e0fb974e07ad0ca Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Fri, 17 May 2019 09:47:32 +0100 Subject: [PATCH] doc: Removes trailing whitespace Signed-off-by: Thomas Parrott --- doc/containers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/containers.md b/doc/containers.md index c9571a8846..475ff60797 100644 --- a/doc/containers.md +++ b/doc/containers.md @@ -599,7 +599,7 @@ empty (default), no snapshots will be created. `snapshots.schedule.stopped` controls whether or not stopped container are to be automatically snapshotted. It defaults to `false`. `snapshots.pattern` takes a pongo2 template string, and the pongo2 context contains the `creation_date` variable. Be aware that you -should format the date (e.g. use `{{ creation_date|date:"2006-01-02_15-04-05" }}`) +should format the date (e.g. use `{{ creation_date|date:"2006-01-02_15-04-05" }}`) in your template string to avoid forbidden characters in your snapshot name. Another way to avoid name collisions is to use the placeholder `%d`. If a snapshot with the same name (excluding the placeholder) already exists, all existing snapshot From lxc-bot at linuxcontainers.org Fri May 17 08:53:23 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Fri, 17 May 2019 01:53:23 -0700 (PDT) Subject: [lxc-devel] [lxd/master] network: SRIOV VLAN and MAC filtering support Message-ID: <5cde7683.1c69fb81.823a2.b4e2SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 343 bytes Desc: not available URL: -------------- next part -------------- From 5b5e2477b999307cdf400ba3b1812a7a7fe85184 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Fri, 17 May 2019 09:46:13 +0100 Subject: [PATCH 1/2] doc: Adds support for vlan and security.mac_filtering for sriov Signed-off-by: Thomas Parrott --- doc/containers.md | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/doc/containers.md b/doc/containers.md index c9571a8846..48137ab0ed 100644 --- a/doc/containers.md +++ b/doc/containers.md @@ -237,25 +237,25 @@ LXD supports different kind of network devices: Different network interface types have different additional properties, the current list is: -Key | Type | Default | Required | Used by | API extension | Description -:-- | :-- | :-- | :-- | :-- | :-- | :-- -nictype | string | - | yes | all | - | The device type, one of "bridged", "macvlan", "ipvlan", "p2p", "physical", or "sriov" -limits.ingress | string | - | no | bridged, p2p | - | I/O limit in bit/s for incoming traffic (various suffixes supported, see below) -limits.egress | string | - | no | bridged, p2p | - | I/O limit in bit/s for outgoing traffic (various suffixes supported, see below) -limits.max | string | - | no | bridged, p2p | - | Same as modifying both limits.ingress and limits.egress -name | string | kernel assigned | no | all | - | The name of the interface inside the container -host\_name | string | randomly assigned | no | bridged, p2p | - | The name of the interface inside the host -hwaddr | string | randomly assigned | no | bridged, macvlan, physical, sriov | - | The MAC address of the new interface -mtu | integer | parent MTU | no | all | - | The MTU of the new interface -parent | string | - | yes | bridged, macvlan, ipvlan, physical, sriov | - | The name of the host device or bridge -vlan | integer | - | no | macvlan, ipvlan, physical | network\_vlan, network\_vlan\_physical | The VLAN ID to attach to -ipv4.address | string | - | no | bridged, ipvlan | network | An IPv4 address to assign to the container through DHCP (bridged), for IPVLAN comma separated list of static addresses (at least 1 required) -ipv6.address | string | - | no | bridged, ipvlan | network | An IPv6 address to assign to the container through DHCP (bridged), for IPVLAN comma separated list of static addresses (at least 1 required) -ipv4.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv4 static routes to add on host to nic -ipv6.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv6 static routes to add on host to nic -security.mac\_filtering | boolean | false | no | bridged | network | Prevent the container from spoofing another's MAC address -maas.subnet.ipv4 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv4 subnet to register the container in -maas.subnet.ipv6 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv6 subnet to register the container in +Key | Type | Default | Required | Used by | API extension | Description +:-- | :-- | :-- | :-- | :-- | :-- | :-- +nictype | string | - | yes | all | - | The device type, one of "bridged", "macvlan", "ipvlan", "p2p", "physical", or "sriov" +limits.ingress | string | - | no | bridged, p2p | - | I/O limit in bit/s for incoming traffic (various suffixes supported, see below) +limits.egress | string | - | no | bridged, p2p | - | I/O limit in bit/s for outgoing traffic (various suffixes supported, see below) +limits.max | string | - | no | bridged, p2p | - | Same as modifying both limits.ingress and limits.egress +name | string | kernel assigned | no | all | - | The name of the interface inside the container +host\_name | string | randomly assigned | no | bridged, p2p | - | The name of the interface inside the host +hwaddr | string | randomly assigned | no | bridged, macvlan, physical, sriov | - | The MAC address of the new interface +mtu | integer | parent MTU | no | all | - | The MTU of the new interface +parent | string | - | yes | bridged, macvlan, ipvlan, physical, sriov | - | The name of the host device or bridge +vlan | integer | - | no | macvlan, ipvlan, physical, sriov | network\_vlan, network\_vlan\_physical, network\_vlan\_sriov | The VLAN ID to attach to +ipv4.address | string | - | no | bridged, ipvlan | network | An IPv4 address to assign to the container through DHCP (bridged), for IPVLAN comma separated list of static addresses (at least 1 required) +ipv6.address | string | - | no | bridged, ipvlan | network | An IPv6 address to assign to the container through DHCP (bridged), for IPVLAN comma separated list of static addresses (at least 1 required) +ipv4.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv4 static routes to add on host to nic +ipv6.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv6 static routes to add on host to nic +security.mac\_filtering | boolean | false | no | bridged, sriov | network | Prevent the container from spoofing another's MAC address +maas.subnet.ipv4 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv4 subnet to register the container in +maas.subnet.ipv6 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv6 subnet to register the container in #### bridged, macvlan or ipvlan for connection to physical network The `bridged`, `macvlan` and `ipvlan` interface types can both be used to connect From 3b3b4c306811994fc5f27a85b934e335bf78a24f Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Fri, 17 May 2019 09:47:32 +0100 Subject: [PATCH 2/2] doc: Removes trailing whitespace Signed-off-by: Thomas Parrott --- doc/containers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/containers.md b/doc/containers.md index 48137ab0ed..0fd79956a2 100644 --- a/doc/containers.md +++ b/doc/containers.md @@ -599,7 +599,7 @@ empty (default), no snapshots will be created. `snapshots.schedule.stopped` controls whether or not stopped container are to be automatically snapshotted. It defaults to `false`. `snapshots.pattern` takes a pongo2 template string, and the pongo2 context contains the `creation_date` variable. Be aware that you -should format the date (e.g. use `{{ creation_date|date:"2006-01-02_15-04-05" }}`) +should format the date (e.g. use `{{ creation_date|date:"2006-01-02_15-04-05" }}`) in your template string to avoid forbidden characters in your snapshot name. Another way to avoid name collisions is to use the placeholder `%d`. If a snapshot with the same name (excluding the placeholder) already exists, all existing snapshot From lxc-bot at linuxcontainers.org Fri May 17 09:56:14 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Fri, 17 May 2019 02:56:14 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/storage/zfs: Fix snapshot restore on project Message-ID: <5cde853e.1c69fb81.548b4.ecb4SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 370 bytes Desc: not available URL: -------------- next part -------------- From 8b93014bc74f90b5088589a45a391baa2f8f7b5b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Fri, 17 May 2019 11:55:38 +0200 Subject: [PATCH] lxd/storage/zfs: Fix snapshot restore on project MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #5773 Signed-off-by: Stéphane Graber --- lxd/storage_zfs.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lxd/storage_zfs.go b/lxd/storage_zfs.go index 5667c557ae..dc849f0f80 100644 --- a/lxd/storage_zfs.go +++ b/lxd/storage_zfs.go @@ -1557,7 +1557,7 @@ func (s *storageZfs) ContainerRestore(target container, source container) error cName, snapOnlyName, _ := containerGetParentAndSnapshotName(source.Name()) snapName := fmt.Sprintf("snapshot-%s", snapOnlyName) - err = zfsPoolVolumeSnapshotRestore(s.getOnDiskPoolName(), fmt.Sprintf("containers/%s", cName), snapName) + err = zfsPoolVolumeSnapshotRestore(s.getOnDiskPoolName(), fmt.Sprintf("containers/%s", projectPrefix(source.Project(), cName)), snapName) if err != nil { return err } From lxc-bot at linuxcontainers.org Fri May 17 11:14:12 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Fri, 17 May 2019 04:14:12 -0700 (PDT) Subject: [lxc-devel] [lxd/master] doc: Re-structures container nic docs into each nic type Message-ID: <5cde9784.1c69fb81.49055.fb93SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 361 bytes Desc: not available URL: -------------- next part -------------- From bb0f6907819956516cdb96488e689ddb796b115c Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Fri, 17 May 2019 12:12:06 +0100 Subject: [PATCH] doc: Re-structures container nic docs into each nic type Signed-off-by: Thomas Parrott --- doc/containers.md | 135 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 114 insertions(+), 21 deletions(-) diff --git a/doc/containers.md b/doc/containers.md index 475ff60797..4c96a22e5c 100644 --- a/doc/containers.md +++ b/doc/containers.md @@ -235,27 +235,120 @@ LXD supports different kind of network devices: - `p2p`: Creates a virtual device pair, putting one side in the container and leaving the other side on the host. - `sriov`: Passes a virtual function of an SR-IOV enabled physical network device into the container. -Different network interface types have different additional properties, the current list is: - -Key | Type | Default | Required | Used by | API extension | Description -:-- | :-- | :-- | :-- | :-- | :-- | :-- -nictype | string | - | yes | all | - | The device type, one of "bridged", "macvlan", "ipvlan", "p2p", "physical", or "sriov" -limits.ingress | string | - | no | bridged, p2p | - | I/O limit in bit/s for incoming traffic (various suffixes supported, see below) -limits.egress | string | - | no | bridged, p2p | - | I/O limit in bit/s for outgoing traffic (various suffixes supported, see below) -limits.max | string | - | no | bridged, p2p | - | Same as modifying both limits.ingress and limits.egress -name | string | kernel assigned | no | all | - | The name of the interface inside the container -host\_name | string | randomly assigned | no | bridged, p2p | - | The name of the interface inside the host -hwaddr | string | randomly assigned | no | bridged, macvlan, physical, sriov | - | The MAC address of the new interface -mtu | integer | parent MTU | no | all | - | The MTU of the new interface -parent | string | - | yes | bridged, macvlan, ipvlan, physical, sriov | - | The name of the host device or bridge -vlan | integer | - | no | macvlan, ipvlan, physical | network\_vlan, network\_vlan\_physical | The VLAN ID to attach to -ipv4.address | string | - | no | bridged, ipvlan | network | An IPv4 address to assign to the container through DHCP (bridged), for IPVLAN comma separated list of static addresses (at least 1 required) -ipv6.address | string | - | no | bridged, ipvlan | network | An IPv6 address to assign to the container through DHCP (bridged), for IPVLAN comma separated list of static addresses (at least 1 required) -ipv4.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv4 static routes to add on host to nic -ipv6.routes | string | - | no | bridged, p2p | container\_nic\_routes | Comma delimited list of IPv6 static routes to add on host to nic -security.mac\_filtering | boolean | false | no | bridged | network | Prevent the container from spoofing another's MAC address -maas.subnet.ipv4 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv4 subnet to register the container in -maas.subnet.ipv6 | string | - | no | bridged, macvlan, physical, sriov | maas\_network | MAAS IPv6 subnet to register the container in +Different network interface types have different additional properties. + +#### nictype: physical + +Straight physical device passthrough from the host. The targeted device will vanish from the host and appear in the container. + +Device configuration properties: + +Key | Type | Default | Required | API extension | Description +:-- | :-- | :-- | :-- | :-- | :-- +nictype | string | - | yes | - | The device type: "physical" +parent | string | - | yes | - | The name of the host device +name | string | kernel assigned | no | - | The name of the interface inside the container +mtu | integer | parent MTU | no | - | The MTU of the new interface +hwaddr | string | randomly assigned | no | - | The MAC address of the new interface +vlan | integer | - | no | network\_vlan\_physical | The VLAN ID to attach to +maas.subnet.ipv4 | string | - | no | maas\_network | MAAS IPv4 subnet to register the container in +maas.subnet.ipv6 | string | - | no | maas\_network | MAAS IPv6 subnet to register the container in + +#### nictype: bridged + +Uses an existing bridge on the host and creates a virtual device pair to connect the host bridge to the container. + +Device configuration properties: + +Key | Type | Default | Required | API extension | Description +:-- | :-- | :-- | :-- | :-- | :-- +nictype | string | - | yes | - | The device type: "bridged" +parent | string | - | yes | - | The name of the host device +name | string | kernel assigned | no | - | The name of the interface inside the container +mtu | integer | parent MTU | no | - | The MTU of the new interface +hwaddr | string | randomly assigned | no | - | The MAC address of the new interface +host\_name | string | randomly assigned | no | - | The name of the interface inside the host +limits.ingress | string | - | no | - | I/O limit in bit/s for incoming traffic (various suffixes supported, see below) +limits.egress | string | - | no | - | I/O limit in bit/s for outgoing traffic (various suffixes supported, see below) +limits.max | string | - | no | - | Same as modifying both limits.ingress and limits.egress +ipv4.address | string | - | no | network | An IPv4 address to assign to the container through DHCP +ipv6.address | string | - | no | network | An IPv6 address to assign to the container through DHCP +ipv4.routes | string | - | no | container\_nic\_routes | Comma delimited list of IPv4 static routes to add on host to nic +ipv6.routes | string | - | no | container\_nic\_routes | Comma delimited list of IPv6 static routes to add on host to nic +security.mac\_filtering | boolean | false | no | network | Prevent the container from spoofing another's MAC address +maas.subnet.ipv4 | string | - | no | maas\_network | MAAS IPv4 subnet to register the container in +maas.subnet.ipv6 | string | - | no | maas\_network | MAAS IPv6 subnet to register the container in + +#### nictype: macvlan + +Sets up a new network device based on an existing one but using a different MAC address. + +Device configuration properties: + +Key | Type | Default | Required | API extension | Description +:-- | :-- | :-- | :-- | :-- | :-- +nictype | string | - | yes | - | The device type: "macvlan" +parent | string | - | yes | - | The name of the host device +name | string | kernel assigned | no | - | The name of the interface inside the container +mtu | integer | parent MTU | no | - | The MTU of the new interface +hwaddr | string | randomly assigned | no | - | The MAC address of the new interface +host\_name | string | randomly assigned | no | - | The name of the interface inside the host +vlan | integer | - | no | network\_vlan | The VLAN ID to attach to +maas.subnet.ipv4 | string | - | no | maas\_network | MAAS IPv4 subnet to register the container in +maas.subnet.ipv6 | string | - | no | maas\_network | MAAS IPv6 subnet to register the container in + +#### nictype: ipvlan + +Sets up a new network device based on an existing one using the same MAC address but a different IP. + +Device configuration properties: + +Key | Type | Default | Required | API extension | Description +:-- | :-- | :-- | :-- | :-- | :-- +nictype | string | - | yes | container_nic_ipvlan | The device type: "ipvlan" +parent | string | - | yes | - | The name of the host device +name | string | kernel assigned | no | - | The name of the interface inside the container +mtu | integer | parent MTU | no | - | The MTU of the new interface +hwaddr | string | randomly assigned | no | - | The MAC address of the new interface +host\_name | string | randomly assigned | no | - | The name of the interface inside the host +ipv4.address | string | - | no | network | Comma delimited list of IPv4 static addresses to add to container +ipv6.address | string | - | no | network | Comma delimited list of IPv6 static addresses to add to container +vlan | integer | - | no | network\_vlan | The VLAN ID to attach to + +#### nictype: p2p + +Creates a virtual device pair, putting one side in the container and leaving the other side on the host. + +Device configuration properties: + +Key | Type | Default | Required | API extension | Description +:-- | :-- | :-- | :-- | :-- | :-- +nictype | string | - | yes | - | The device type: "p2p" +name | string | kernel assigned | no | - | The name of the interface inside the container +mtu | integer | parent MTU | no | - | The MTU of the new interface +hwaddr | string | randomly assigned | no | - | The MAC address of the new interface +host\_name | string | randomly assigned | no | - | The name of the interface inside the host +limits.ingress | string | - | no | - | I/O limit in bit/s for incoming traffic (various suffixes supported, see below) +limits.egress | string | - | no | - | I/O limit in bit/s for outgoing traffic (various suffixes supported, see below) +limits.max | string | - | no | - | Same as modifying both limits.ingress and limits.egress +ipv4.routes | string | - | no | container\_nic\_routes | Comma delimited list of IPv4 static routes to add on host to nic +ipv6.routes | string | - | no | container\_nic\_routes | Comma delimited list of IPv6 static routes to add on host to nic + +#### nictype: sriov + +Passes a virtual function of an SR-IOV enabled physical network device into the container. + +Device configuration properties: + +Key | Type | Default | Required | API extension | Description +:-- | :-- | :-- | :-- | :-- | :-- +nictype | string | - | yes | - | The device type: "physical" +parent | string | - | yes | - | The name of the host device +name | string | kernel assigned | no | - | The name of the interface inside the container +mtu | integer | parent MTU | no | - | The MTU of the new interface +hwaddr | string | randomly assigned | no | - | The MAC address of the new interface +maas.subnet.ipv4 | string | - | no | maas\_network | MAAS IPv4 subnet to register the container in +maas.subnet.ipv6 | string | - | no | maas\_network | MAAS IPv6 subnet to register the container in #### bridged, macvlan or ipvlan for connection to physical network The `bridged`, `macvlan` and `ipvlan` interface types can both be used to connect From lxc-bot at linuxcontainers.org Fri May 17 15:07:03 2019 From: lxc-bot at linuxcontainers.org (ajkavanagh on Github) Date: Fri, 17 May 2019 08:07:03 -0700 (PDT) Subject: [lxc-devel] [pylxd/master] Fix execute command missing output with pauses in output text Message-ID: <5cdece17.1c69fb81.86aa3.393aSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 728 bytes Desc: not available URL: -------------- next part -------------- From 8010a87cae33bd56c17986f98c3a75ddab3f04dd Mon Sep 17 00:00:00 2001 From: Alex Kavanagh Date: Fri, 17 May 2019 16:00:00 +0100 Subject: [PATCH] Fix execute command missing output with pauses in output text For the container execute command, if the command executed has pauses in the output text, the websocket will send binary messages. These were interpreted as the stream being completed, and pylxd closed the websocket, thus losing the rest of the output. Delving into the code indicated some very strange behaviour across the threads, so as part of solving the problem, this has been cleaned up as well. Closes: #362 --- pylxd/models/container.py | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-) diff --git a/pylxd/models/container.py b/pylxd/models/container.py index cbd8206f..4c4f91e3 100644 --- a/pylxd/models/container.py +++ b/pylxd/models/container.py @@ -353,7 +353,7 @@ def unfreeze(self, timeout=30, force=True, wait=False): def execute( self, commands, environment={}, encoding=None, decode=True, stdin_payload=None, stdin_encoding="utf-8", - stdout_handler=None, stderr_handler=None + stdout_handler=None, stderr_handler=None, ): """Execute a command on the container. @@ -430,8 +430,17 @@ def execute( break time.sleep(.5) # pragma: no cover - while len(manager.websockets.values()) > 0: - time.sleep(.1) # pragma: no cover + try: + stdin.close() + except BrokenPipeError: + pass + + stdout.finish_soon() + stderr.finish_soon() + manager.close_all() + + while not stdout.finished or not stderr.finished: + time.sleep(.1) # progma: no cover manager.stop() manager.join() @@ -618,6 +627,9 @@ def __init__(self, manager, *args, **kwargs): self.encoding = kwargs.pop('encoding', None) self.handler = kwargs.pop('handler', None) self.message_encoding = None + self.finish_off = False + self.finished = False + self.last_message_empty = False super(_CommandWebsocketClient, self).__init__(*args, **kwargs) def handshake_ok(self): @@ -626,15 +638,27 @@ def handshake_ok(self): def received_message(self, message): if message.data is None or len(message.data) == 0: - self.manager.remove(self) - return + self.last_message_empty = True + if self.finish_off: + self.finished = True + return + else: + self.last_message_empty = False if message.encoding and self.message_encoding is None: self.message_encoding = message.encoding if self.handler: self.handler(self._maybe_decode(message.data)) self.buffer.append(message.data) - if isinstance(message, BinaryMessage): - self.manager.remove(self) + if self.finish_off and isinstance(message, BinaryMessage): + self.finished = True + + def closed(self, code, reason=None): + self.finished = True + + def finish_soon(self): + self.finish_off = True + if self.last_message_empty: + self.finished = True def _maybe_decode(self, buffer): if self.decode and buffer is not None: From lxc-bot at linuxcontainers.org Fri May 17 17:30:17 2019 From: lxc-bot at linuxcontainers.org (ajkavanagh on Github) Date: Fri, 17 May 2019 10:30:17 -0700 (PDT) Subject: [lxc-devel] [pylxd/master] Fix change of behaviour on execute introduced in #363 Message-ID: <5cdeefa9.1c69fb81.50791.c183SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 546 bytes Desc: not available URL: -------------- next part -------------- From f99f6c63454fb2deaaeeab065658978379c248d7 Mon Sep 17 00:00:00 2001 From: Alex Kavanagh Date: Fri, 17 May 2019 18:27:49 +0100 Subject: [PATCH] Fix change of behaviour on execute introduced in #363 Essentially, the behavior changed for when a caller to the container's execute method wished to handle the packets received back on the websocket. This change fixes the reversion. Signed-off-by: Alex Kavanagh --- pylxd/models/container.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pylxd/models/container.py b/pylxd/models/container.py index 4c4f91e3..d13c26a5 100644 --- a/pylxd/models/container.py +++ b/pylxd/models/container.py @@ -641,7 +641,7 @@ def received_message(self, message): self.last_message_empty = True if self.finish_off: self.finished = True - return + return else: self.last_message_empty = False if message.encoding and self.message_encoding is None: From noreply at github.com Sat May 18 09:54:07 2019 From: noreply at github.com (Christian Brauner) Date: Sat, 18 May 2019 02:54:07 -0700 Subject: [lxc-devel] [lxc/lxc] 8470bf: Travis: Adds -Wall and -Werror gcc flags to automa... Message-ID: Branch: refs/heads/stable-3.0 Home: https://github.com/lxc/lxc Commit: 8470bf1ce48951dadff63c2cba480bae2628973f https://github.com/lxc/lxc/commit/8470bf1ce48951dadff63c2cba480bae2628973f Author: tomponline Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M .travis.yml Log Message: ----------- Travis: Adds -Wall and -Werror gcc flags to automatic build. Signed-off-by: tomponline Commit: 2d2df5af015c52c2f3a11b05fa85b5dafda5a944 https://github.com/lxc/lxc/commit/2d2df5af015c52c2f3a11b05fa85b5dafda5a944 Author: tomponline Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M .travis.yml Log Message: ----------- travis: Attempt to fix src/lxc/cmd/lxc_init.c:251: undefined reference to `pthread_sigmask Signed-off-by: tomponline Commit: c5aab2fcb9bf7adadf1bf7bea0b64f4617b30ada https://github.com/lxc/lxc/commit/c5aab2fcb9bf7adadf1bf7bea0b64f4617b30ada Author: tomponline Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/storage/lvm.c M src/lxc/utils.c M src/lxc/utils.h Log Message: ----------- lvm: Updates lvcreate to wipe signatures if supported, fallbacks to old command if not. Signed-off-by: tomponline Commit: 3c5b6e30d850be2aa52afd78b1a63bb2e34b00f3 https://github.com/lxc/lxc/commit/3c5b6e30d850be2aa52afd78b1a63bb2e34b00f3 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: fix network device removal Closes #2849. Signed-off-by: Christian Brauner Commit: e6a19decfae1f7da51e314debba24ef2e5806110 https://github.com/lxc/lxc/commit/e6a19decfae1f7da51e314debba24ef2e5806110 Author: Felix Abecassis Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/start.c Log Message: ----------- Fix user namespace pdeathsig handling Signed-off-by: Felix Abecassis Commit: 217a336c16f99b31b9a19fe00ae1342ea6074366 https://github.com/lxc/lxc/commit/217a336c16f99b31b9a19fe00ae1342ea6074366 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/cmd/lxc_user_nic.c Log Message: ----------- lxc-user-nic: small tweaks Signed-off-by: Christian Brauner Cc: Akihiro Suda Commit: 45bfff5bb00121afa3624ccec7a2c43c353b2c73 https://github.com/lxc/lxc/commit/45bfff5bb00121afa3624ccec7a2c43c353b2c73 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M doc/lxc-user-nic.sgml.in Log Message: ----------- doc: update lxc-user-nic manpage Closes #1823. Signed-off-by: Christian Brauner Cc: Akihiro Suda Commit: 7c0312523439f4017f1020d3af23380103e969ae https://github.com/lxc/lxc/commit/7c0312523439f4017f1020d3af23380103e969ae Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/cmd/lxc_user_nic.c Log Message: ----------- lxc-user-nic: validate request Signed-off-by: Christian Brauner Cc: Akihiro Suda Commit: c9ca5d6b220bb7054aadf1d364c0910062610a8a https://github.com/lxc/lxc/commit/c9ca5d6b220bb7054aadf1d364c0910062610a8a Author: KATOH Yasufumi Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M doc/ja/lxc-user-nic.sgml.in Log Message: ----------- doc: update Japanese lxc-user-nic manpage Update for commit db74bbd Signed-off-by: KATOH Yasufumi Commit: a689c4afcc5a410cd0e793b398a57057102aac34 https://github.com/lxc/lxc/commit/a689c4afcc5a410cd0e793b398a57057102aac34 Author: yosukesan Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M doc/api/Makefile.am Log Message: ----------- fix: #2927 api doc generation fails under out of source build. Signed-off-by: yosukesan Commit: e93cd8c5b6d4266cd35c7520f5f886518500e612 https://github.com/lxc/lxc/commit/e93cd8c5b6d4266cd35c7520f5f886518500e612 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/storage/btrfs.c Log Message: ----------- storage: prevent unitialized variable warning We can simply fix this issue by switching to our cleanup macros instead of manually freeing the memory. Closes #2912. Signed-off-by: Christian Brauner Commit: 404b9449789d2ab00358699d1ad381e857959080 https://github.com/lxc/lxc/commit/404b9449789d2ab00358699d1ad381e857959080 Author: pgauret Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/storage/zfs.c Log Message: ----------- storage: update zfs Change zfs arguments. This also works with older zfs versions, tested with zfs 0.7.9-3 on Ubuntu 18.10. Closes #2916. Signed-off-by: Paul Gauret [christian.brauner at ubuntu.com: adapt commit message and add Signed-off-by for Paul] Signed-off-by: Christian Brauner Commit: 6203554bdc8f2b9f8d1a027b4ffb73cb21eb1aa4 https://github.com/lxc/lxc/commit/6203554bdc8f2b9f8d1a027b4ffb73cb21eb1aa4 Author: Felix Abecassis Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/conf.c Log Message: ----------- conf: do lxc.mount.entry mounts right after lxc.mount.fstab These configuration options use the same syntax and therefore it seems more intuitive to have the same behavior for both of them, which is not the case today since mount hooks and autodev mounts are called between the two. See: https://github.com/lxc/lxc/issues/2932 Signed-off-by: Felix Abecassis Commit: 72dd37ab8c1863ed9acabc737d7870827d028359 https://github.com/lxc/lxc/commit/72dd37ab8c1863ed9acabc737d7870827d028359 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/include/netns_ifaddrs.c M src/lxc/macro.h Log Message: ----------- netns_getifaddrs: adapt to kernel changes s/NETLINK_DUMP_STRICT_CHK/NETLINK_GET_STRICT_CHK/g Signed-off-by: Christian Brauner Commit: b089ff62cddb7e43dc6385c89e8da39f70df8726 https://github.com/lxc/lxc/commit/b089ff62cddb7e43dc6385c89e8da39f70df8726 Author: Tycho Andersen Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/tools/lxc_start.c Log Message: ----------- lxc-start: remove bad doc We don't in fact exit(1) if this is not specified, and it wouldn't make sense to, since most people probably don't specify this. Signed-off-by: Tycho Andersen Commit: f3d279cc990794d515b75dd119272e742bf294ba https://github.com/lxc/lxc/commit/f3d279cc990794d515b75dd119272e742bf294ba Author: pgauret Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/storage/zfs.c Log Message: ----------- Fix 'zfs get' command order Another case of calling 'zfs get' which requires reordering arguments to work with latest zfs. Signed-off-by: Paul Gauret Commit: 857147910fb867008870671e5ad032dcd665905c https://github.com/lxc/lxc/commit/857147910fb867008870671e5ad032dcd665905c Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/af_unix.c M src/lxc/attach.c M src/lxc/commands.c M src/lxc/commands.h M src/lxc/lxccontainer.c M src/lxc/macro.h M src/lxc/start.c Log Message: ----------- commands: partially backport seccomp notify This backports seccomp notify into various parts of the codebase as a pure nop to make maintenance easier. Signed-off-by: Christian Brauner Commit: 548b3229d8d3bc18ddc11291aaafa32e0f479f64 https://github.com/lxc/lxc/commit/548b3229d8d3bc18ddc11291aaafa32e0f479f64 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/af_unix.c M src/lxc/af_unix.h Log Message: ----------- af_unix: backport helper functions This backports various helpers associated with seccomp notify to make maintenance easier. Signed-off-by: Christian Brauner Commit: 1e9963ac1f56f30c7674b8f1a85e97644693d44b https://github.com/lxc/lxc/commit/1e9963ac1f56f30c7674b8f1a85e97644693d44b Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/start.c Log Message: ----------- start: silence clang Signed-off-by: Christian Brauner Commit: 8a9a1a02dd0cebd5ecdda884a5f11dfc74ab910c https://github.com/lxc/lxc/commit/8a9a1a02dd0cebd5ecdda884a5f11dfc74ab910c Author: tomponline Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Fixes a little typo in an error message Signed-off-by: tomponline Commit: 2b73a79090a719caabf8820102668fc6df9851dd https://github.com/lxc/lxc/commit/2b73a79090a719caabf8820102668fc6df9851dd Author: tomponline Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Adds upscript handling for vlan network type Signed-off-by: tomponline Commit: 0fef58cfa97d9c800678d6b898ffdd0cce661f0b https://github.com/lxc/lxc/commit/0fef58cfa97d9c800678d6b898ffdd0cce661f0b Author: tomponline Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Fixes vlan hook script Signed-off-by: tomponline Commit: 1350fc845f1a78bf0a2ba6d22826a7fc220a0113 https://github.com/lxc/lxc/commit/1350fc845f1a78bf0a2ba6d22826a7fc220a0113 Author: tomponline Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M .gitignore Log Message: ----------- tests: Updates .gitignore to ignore test build artefacts Signed-off-by: tomponline Commit: a533ec463c52313255e73985f62eb0aea20c6396 https://github.com/lxc/lxc/commit/a533ec463c52313255e73985f62eb0aea20c6396 Author: tomponline Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/confile_utils.c Log Message: ----------- network: Fixes bug in macvlan mode selection Signed-off-by: tomponline Commit: 7b0aa99bdf60a7b4b682444ff0166a323e085258 https://github.com/lxc/lxc/commit/7b0aa99bdf60a7b4b682444ff0166a323e085258 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/af_unix.c Log Message: ----------- seccomp: notifier fixes Signed-off-by: Christian Brauner Commit: 0dfb9453e3067ca9093e198ef799e0c69c03e6a1 https://github.com/lxc/lxc/commit/0dfb9453e3067ca9093e198ef799e0c69c03e6a1 Author: Serge Hallyn Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/confile_utils.c Log Message: ----------- namespaces: allow a pathname to a nsfd for namespace to share Signed-off-by: Serge Hallyn Commit: 45760f6200dfbf0b4d8835946a23464c5d8ab396 https://github.com/lxc/lxc/commit/45760f6200dfbf0b4d8835946a23464c5d8ab396 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/af_unix.c M src/lxc/network.c M src/lxc/nl.c Log Message: ----------- tree-wide: make socket SOCK_CLOEXEC Signed-off-by: Christian Brauner Commit: 84721447b92a8a77c26d00cd1880d24e36723195 https://github.com/lxc/lxc/commit/84721447b92a8a77c26d00cd1880d24e36723195 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/compiler.h M src/lxc/raw_syscalls.c Log Message: ----------- compiler: add __returns_twice attribute The returns_twice attribute tells the compiler that a function may return more than one time. The compiler will ensure that all registers are dead before calling such a function and will emit a warning about the variables that may be clobbered after the second return from the function. Examples of such functions are setjmp and vfork. The longjmp-like counterpart of such function, if any, might need to be marked with the noreturn attribute. Signed-off-by: Christian Brauner Commit: 4f464a77f4f1a55a675593db7cad7e6274a39547 https://github.com/lxc/lxc/commit/4f464a77f4f1a55a675593db7cad7e6274a39547 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/raw_syscalls.c M src/lxc/raw_syscalls.h M src/lxc/start.c M src/lxc/start.h Log Message: ----------- raw_syscalls: add initial support for pidfd_send_signal() Well, I added this syscall so we better use it. :) Signed-off-by: Christian Brauner Commit: c9ecca0781d836cfca3b4c9f430dddf8908817c8 https://github.com/lxc/lxc/commit/c9ecca0781d836cfca3b4c9f430dddf8908817c8 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Devices created in rootfs instead of rootfs/dev Added /dev in the mknod commands. Signed-off-by: Rachid Koucha Commit: 47576a3f633096734459712858c2708c4a5a26b7 https://github.com/lxc/lxc/commit/47576a3f633096734459712858c2708c4a5a26b7 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/macro.h M src/lxc/utils.c Log Message: ----------- utils: improve switch_to_ns() Signed-off-by: Christian Brauner Commit: ceda5ac37679a7da75f22506e507b0428de1b31d https://github.com/lxc/lxc/commit/ceda5ac37679a7da75f22506e507b0428de1b31d Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/raw_syscalls.c Log Message: ----------- raw_syscalls: simplify assembly Signed-off-by: Christian Brauner Co-developed-by: David Howells Signed-off-by: David Howells Commit: df5644f3a7fc6d46118abeb6e6c83c09770edd28 https://github.com/lxc/lxc/commit/df5644f3a7fc6d46118abeb6e6c83c09770edd28 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/attach.c M src/lxc/conf.c M src/lxc/raw_syscalls.c M src/lxc/raw_syscalls.h M src/lxc/start.c M src/lxc/utils.c M src/tests/lxc_raw_clone.c Log Message: ----------- clone: add infrastructure for CLONE_PIDFD https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eac7078a0fff1e72cf2b641721e3f55ec7e5e21e Signed-off-by: Christian Brauner Commit: ded425a688daa55ad1a87877b0d5065dead3e4bd https://github.com/lxc/lxc/commit/ded425a688daa55ad1a87877b0d5065dead3e4bd Author: Thomas Parrott Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Adds mtu support for phys and macvlan types Signed-off-by: Thomas Parrott Commit: 463334b7fb3c98d359d4724063e3de9807de9925 https://github.com/lxc/lxc/commit/463334b7fb3c98d359d4724063e3de9807de9925 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/conf.c M src/lxc/namespace.c M src/lxc/namespace.h M src/lxc/start.c M src/lxc/storage/nbd.c M src/lxc/tools/lxc_unshare.c Log Message: ----------- namespace: support CLONE_PIDFD with lxc_clone() Signed-off-by: Christian Brauner Commit: 3ef7f2c0a2fac971e34aa13a3b310c1f4790e2dd https://github.com/lxc/lxc/commit/3ef7f2c0a2fac971e34aa13a3b310c1f4790e2dd Author: Thomas Parrott Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/network.c M src/lxc/network.h Log Message: ----------- network: Restores phys device MTU on container shutdown The phys devices will now have their original MTUs recorded at start and restored at shutdown. This is to protect the original phys device from having any container level MTU customisation being applied to the device once it is restored to the host. Signed-off-by: Thomas Parrott Commit: e77c83f65e32a115b9bb7446ada8e53a53f7ca55 https://github.com/lxc/lxc/commit/e77c83f65e32a115b9bb7446ada8e53a53f7ca55 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/start.c M src/lxc/start.h Log Message: ----------- start: use CLONE_PIDFD Use CLONE_PIDFD when possible. Note the clone() syscall ignores unknown flags which is usually a design mistake. However, for us this bug is a feature since we can just pass the flag along and see whether the kernel has given us a pidfd. Signed-off-by: Christian Brauner Commit: dcf5c826501910285af07c34eda05375d25d51a1 https://github.com/lxc/lxc/commit/dcf5c826501910285af07c34eda05375d25d51a1 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Redirect error messages to stderr Some error messages were not redirected to stderr. Moreover, do "exit 0" instead of "exit 1" when "help" option is passed. Signed-off-by: Rachid Koucha Commit: 4e6bfc48f51761564948676248c2bfe9c9c505fc https://github.com/lxc/lxc/commit/4e6bfc48f51761564948676248c2bfe9c9c505fc Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M CODING_STYLE.md Log Message: ----------- coding style: update Signed-off-by: Christian Brauner Commit: 46dde5277ca0b4bd22f664eea87713ccb8163e87 https://github.com/lxc/lxc/commit/46dde5277ca0b4bd22f664eea87713ccb8163e87 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- New --bbpath option and unecessary --rootfs checks . Add the "--bbpath" option to pass an alternate busybox pathname instead of the one found from ${PATH}. . Take this opportunity to add some formatting in the usage display . As a try is done to pick rootfs from the config file and set it to ${path}/rootfs, it is unnecessary to make it mandatory Signed-off-by: Rachid Koucha Commit: cd2ca8a1ddaf919cd00a9288f3014b171a8f6449 https://github.com/lxc/lxc/commit/cd2ca8a1ddaf919cd00a9288f3014b171a8f6449 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/lxccontainer.c Log Message: ----------- lxccontainer: do not display if missing privileges lxc-ls without root privileges on privileged containers should not display information. In lxc_container_new(), ongoing_create()'s result is not checked for all possible returned values. Hence, an unprivileged user can send command messages to the container's monitor. For example: $ lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr - 0 - - - false $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.51 - false After this change: $ lxc-ls -P /.../tests -f <-------- No more display without root privileges $ sudo lxc-ls -P /.../tests -f NAME STATE AUTOSTART GROUPS IPV4 IPV6 UNPRIVILEGED ctr RUNNING 0 - 10.0.3.37 - false $ Signed-off-by: Rachid Koucha Signed-off-by: Christian Brauner Commit: 09f55bc4be0491c9e1f3a1c14a900d39bf70f2b7 https://github.com/lxc/lxc/commit/09f55bc4be0491c9e1f3a1c14a900d39bf70f2b7 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M templates/lxc-busybox.in Log Message: ----------- Option --busybox-path instead of --bbpath As suggested during the review. Signed-off-by: Rachid Koucha Commit: d3accb17510346ec29edfbef6e75a6cf1c3a07c9 https://github.com/lxc/lxc/commit/d3accb17510346ec29edfbef6e75a6cf1c3a07c9 Author: Radostin Stoyanov Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/criu.c Log Message: ----------- criu: Use -v4 instead of -vvvvvv CRIU has only 4 levels of verbosity (errors, warnings, info, debug). Thus, using `-v4` is more appropriate. https://criu.org/Logging Signed-off-by: Radostin Stoyanov Commit: b526996b6f4f83a8cbb7d12abeff113c278b3697 https://github.com/lxc/lxc/commit/b526996b6f4f83a8cbb7d12abeff113c278b3697 Author: Rikard Falkeborn Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/criu.c Log Message: ----------- criu: Remove unnecessary return after _exit() Since _exit() will terminate, the return statement is dead code. Also, returning -1 from a function with bool as return type is confusing. Detected with cppcheck. Signed-off-by: Rikard Falkeborn Commit: c5e6088f4c3403a2edbd844dac216975bc6f11cf https://github.com/lxc/lxc/commit/c5e6088f4c3403a2edbd844dac216975bc6f11cf Author: Rikard Falkeborn Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/storage/lvm.c Log Message: ----------- lvm: Fix return value if lvm_create_clone fails Returning -1 in a function with return type bool is the same as returning true. Change to return false to indicate error properly. Detected with cppcheck. Signed-off-by: Rikard Falkeborn Commit: 3cd861392a01e17f2fb4694c587876d6cc7f01c0 https://github.com/lxc/lxc/commit/3cd861392a01e17f2fb4694c587876d6cc7f01c0 Author: Rikard Falkeborn Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/storage/zfs.c Log Message: ----------- zfs: Fix return value on zfs_snapshot error Returning -1 in a function with return type bool is the same as returning true. Change to return false to indicate error properly. Detected with cppcheck. Signed-off-by: Rikard Falkeborn Commit: 22c8f39b9d34fffff100f16a8d457759fbc03842 https://github.com/lxc/lxc/commit/22c8f39b9d34fffff100f16a8d457759fbc03842 Author: Rikard Falkeborn Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/initutils.c Log Message: ----------- initutils: Fix memleak on realloc failure Signed-off-by: Rikard Falkeborn Commit: 7d1a06e52e57457b9a89eaa59524d345c6648bfd https://github.com/lxc/lxc/commit/7d1a06e52e57457b9a89eaa59524d345c6648bfd Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M configure.ac Log Message: ----------- Config: check for %m availability GLIBC supports %m to avoid calling strerror(). Using it saves some code space. ==> This check will define HAVE_M_FORMAT to be use wherever possible (e.g. log.h) Signed-off-by: Rachid Koucha Commit: 5d27c86ad1f59bacb685007330f39e48cd5845a6 https://github.com/lxc/lxc/commit/5d27c86ad1f59bacb685007330f39e48cd5845a6 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/log.h Log Message: ----------- Use %m instead of strerror() when available Use %m under HAVE_M_FORMAT instead of strerror() Signed-off-by: Rachid Koucha Commit: 0b8deb656f352354e830b77e288a33b042a14cc7 https://github.com/lxc/lxc/commit/0b8deb656f352354e830b77e288a33b042a14cc7 Author: Rachid Koucha <47061324+Rachid-Koucha at users.noreply.github.com> Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/log.h Log Message: ----------- Error prone semicolon Suppressed error prone semicolon in SYSTRACE() macro. Signed-off-by: Rachid Koucha Commit: eabeaa394f886e1847f4495099f0f2d6731e552f https://github.com/lxc/lxc/commit/eabeaa394f886e1847f4495099f0f2d6731e552f Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M configure.ac Log Message: ----------- configure: handle checks when cross-compiling Signed-off-by: Christian Brauner Commit: c0c0d9ec538eeebb2385d56acd992862d59aeb12 https://github.com/lxc/lxc/commit/c0c0d9ec538eeebb2385d56acd992862d59aeb12 Author: Thomas Parrott Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/macro.h M src/lxc/network.c Log Message: ----------- network: move phys netdevs back to monitor's net ns rather than pid 1's Updates lxc_restore_phys_nics_to_netns() to move phys netdevs back to the monitor's network namespace rather than the previously hardcoded PID 1 net ns. This is to fix instances where LXC is started inside a net ns different from PID 1 and physical devices are moved back to a different net ns when the container is shutdown than the net ns than where the container was started from. Signed-off-by: Thomas Parrott Commit: d880b03482b01e35b7f5af11121f1b3b2f2ad258 https://github.com/lxc/lxc/commit/d880b03482b01e35b7f5af11121f1b3b2f2ad258 Author: Thomas Parrott Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/network.c Log Message: ----------- network: Fixes bug that stopped down hook from running for phys netdevs Signed-off-by: Thomas Parrott Commit: b748fa8f2cd211e3713593ebf2b5e541106ba50a https://github.com/lxc/lxc/commit/b748fa8f2cd211e3713593ebf2b5e541106ba50a Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/attach.c M src/lxc/attach.h M src/lxc/lxccontainer.c Log Message: ----------- attach: do not reload container Let lxc_attach() reuse the already initialized container. Closes https://github.com/lxc/lxd/issues/5755. Signed-off-by: Christian Brauner Commit: 89f59fa2567314eedb66af5d8ca7ad2e0ad468b5 https://github.com/lxc/lxc/commit/89f59fa2567314eedb66af5d8ca7ad2e0ad468b5 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/lxccontainer.c Log Message: ----------- lxccontainer: cleanup attach functions Specifically, refloat function arguments and remove useless comments. Signed-off-by: Christian Brauner Compare: https://github.com/lxc/lxc/compare/9e5d932ccd88...89f59fa25673 From builds at travis-ci.org Sat May 18 09:56:12 2019 From: builds at travis-ci.org (Travis CI) Date: Sat, 18 May 2019 09:56:12 +0000 Subject: [lxc-devel] Broken: lxc/lxc#6839 (stable-3.0 - 89f59fa) In-Reply-To: Message-ID: <5cdfd6bc281e7_43fcb5245650c335045@60327e26-0bcb-4514-8e39-a3bc6748e74d.mail> Build Update for lxc/lxc ------------------------------------- Build: #6839 Status: Broken Duration: 1 min and 34 secs Commit: 89f59fa (stable-3.0) Author: Christian Brauner Message: lxccontainer: cleanup attach functions Specifically, refloat function arguments and remove useless comments. Signed-off-by: Christian Brauner View the changeset: https://github.com/lxc/lxc/compare/9e5d932ccd88...89f59fa25673 View the full build log and details: https://travis-ci.org/lxc/lxc/builds/534132994?utm_medium=notification&utm_source=email -- You can unsubscribe from build emails from the lxc/lxc repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=1693277&utm_medium=notification&utm_source=email. Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email. Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at github.com Sat May 18 10:06:28 2019 From: noreply at github.com (Christian Brauner) Date: Sat, 18 May 2019 03:06:28 -0700 Subject: [lxc-devel] [lxc/lxc] f9df32: lxccontainer: remove unused function Message-ID: Branch: refs/heads/stable-3.0 Home: https://github.com/lxc/lxc Commit: f9df3281fceff6b0c9a5f629f7ef60be489cdae4 https://github.com/lxc/lxc/commit/f9df3281fceff6b0c9a5f629f7ef60be489cdae4 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/lxccontainer.c Log Message: ----------- lxccontainer: remove unused function Signed-off-by: Christian Brauner From builds at travis-ci.org Sat May 18 10:08:39 2019 From: builds at travis-ci.org (Travis CI) Date: Sat, 18 May 2019 10:08:39 +0000 Subject: [lxc-devel] Still Failing: lxc/lxc#6840 (stable-3.0 - f9df328) In-Reply-To: Message-ID: <5cdfd9a77894b_43f9037d58188281049@389b53ea-69e0-4c29-88e4-7da3bffaa774.mail> Build Update for lxc/lxc ------------------------------------- Build: #6840 Status: Still Failing Duration: 1 min and 41 secs Commit: f9df328 (stable-3.0) Author: Christian Brauner Message: lxccontainer: remove unused function Signed-off-by: Christian Brauner View the changeset: https://github.com/lxc/lxc/compare/89f59fa25673...f9df3281fcef View the full build log and details: https://travis-ci.org/lxc/lxc/builds/534135059?utm_medium=notification&utm_source=email -- You can unsubscribe from build emails from the lxc/lxc repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=1693277&utm_medium=notification&utm_source=email. Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email. Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From noreply at github.com Sat May 18 10:23:41 2019 From: noreply at github.com (Christian Brauner) Date: Sat, 18 May 2019 03:23:41 -0700 Subject: [lxc-devel] [lxc/lxc] 1cbdf1: start: remove unused label Message-ID: Branch: refs/heads/stable-3.0 Home: https://github.com/lxc/lxc Commit: 1cbdf1ead9eec11b1cd11c94ff90256ec12d6a61 https://github.com/lxc/lxc/commit/1cbdf1ead9eec11b1cd11c94ff90256ec12d6a61 Author: Christian Brauner Date: 2019-05-18 (Sat, 18 May 2019) Changed paths: M src/lxc/start.c Log Message: ----------- start: remove unused label Signed-off-by: Christian Brauner From builds at travis-ci.org Sat May 18 10:26:37 2019 From: builds at travis-ci.org (Travis CI) Date: Sat, 18 May 2019 10:26:37 +0000 Subject: [lxc-devel] Fixed: lxc/lxc#6841 (stable-3.0 - 1cbdf1e) In-Reply-To: Message-ID: <5cdfdddd544df_43fc360c92170212991@7b6c5ca5-4aca-4bb7-b493-cc52e6ab768e.mail> Build Update for lxc/lxc ------------------------------------- Build: #6841 Status: Fixed Duration: 2 mins and 28 secs Commit: 1cbdf1e (stable-3.0) Author: Christian Brauner Message: start: remove unused label Signed-off-by: Christian Brauner View the changeset: https://github.com/lxc/lxc/compare/f9df3281fcef...1cbdf1ead9ee View the full build log and details: https://travis-ci.org/lxc/lxc/builds/534138112?utm_medium=notification&utm_source=email -- You can unsubscribe from build emails from the lxc/lxc repository going to https://travis-ci.org/account/preferences/unsubscribe?repository=1693277&utm_medium=notification&utm_source=email. Or unsubscribe from *all* email updating your settings at https://travis-ci.org/account/preferences/unsubscribe?utm_medium=notification&utm_source=email. Or configure specific recipients for build notifications in your .travis.yml file. See https://docs.travis-ci.com/user/notifications. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lxc-bot at linuxcontainers.org Sat May 18 12:19:14 2019 From: lxc-bot at linuxcontainers.org (nutterthanos on Github) Date: Sat, 18 May 2019 05:19:14 -0700 (PDT) Subject: [lxc-devel] [linuxcontainers.org/master] Update getting-started.md Message-ID: <5cdff842.1c69fb81.978d2.12cbSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 526 bytes Desc: not available URL: -------------- next part -------------- From f242af9dd615fccbbfa9274a40d4107cfbe1f882 Mon Sep 17 00:00:00 2001 From: nutterthanos <50470661+nutterthanos at users.noreply.github.com> Date: Sat, 18 May 2019 21:44:23 +0930 Subject: [PATCH] Update getting-started.md Ubuntu 14.04LTS is eol for all but people with Ubuntu Advantage. --- content/lxc/getting-started.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/content/lxc/getting-started.md b/content/lxc/getting-started.md index b2e27a5..2f359af 100644 --- a/content/lxc/getting-started.md +++ b/content/lxc/getting-started.md @@ -34,7 +34,7 @@ For your first LXC experience, we recommend you use a recent supported release, such as a recent bugfix release of LXC 1.0. -If using Ubuntu, we recommend you use Ubuntu 14.04 LTS as your container host. +If using Ubuntu, we recommend you use Ubuntu 16.04 LTS as your container host. LXC bugfix releases are available directly in the distribution package repository shortly after release and those offer a clean (unpatched) upstream experience. @@ -43,7 +43,7 @@ with everything that's needed for safe, unprivileged LXC containers. On such an Ubuntu system, installing LXC is as simple as: - sudo apt-get install lxc + sudo apt-get install lxc or sudo snap install lxc Your system will then have all the LXC commands available, all its templates as well as the python3 binding should you want to script LXC. @@ -103,7 +103,7 @@ And now, create your first container with: lxc-create -t download -n my-container The download template will show you a list of distributions, versions and architectures to choose from. -A good example would be "ubuntu", "trusty" (14.04 LTS) and "i386". +A good example would be "ubuntu", "xenial" (16.04 LTS) and "i386". A few seconds later your container will be created and you can start it with: From lxc-bot at linuxcontainers.org Sun May 19 14:43:52 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Sun, 19 May 2019 07:43:52 -0700 (PDT) Subject: [lxc-devel] [lxd/master] container: Rename's OnStop hook function to OnPostStop() Message-ID: <5ce16ba8.1c69fb81.899a0.7221SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 574 bytes Desc: not available URL: -------------- next part -------------- From 0ca23869e4243ff9d4aa542e476c6ebfa718c3e1 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Sun, 19 May 2019 15:37:49 +0100 Subject: [PATCH] container: Rename's OnStop hook function to OnPostStop() This better reflects the actual LXC hook type being used (lxc.hook.post-stop) and allows the OnStop() function to be added back in the future with different functionality to be run by LXC's lxc.hook.stop hook. Signed-off-by: Thomas Parrott --- lxd/api_internal.go | 12 ++++++------ lxd/container.go | 2 +- lxd/container_lxc.go | 8 +++++--- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/lxd/api_internal.go b/lxd/api_internal.go index 508801f243..40f06acbf9 100644 --- a/lxd/api_internal.go +++ b/lxd/api_internal.go @@ -34,7 +34,7 @@ var apiInternal = []APIEndpoint{ internalShutdownCmd, internalContainerOnStartCmd, internalContainerOnNetworkUpCmd, - internalContainerOnStopCmd, + internalContainerOnPostStopCmd, internalContainersCmd, internalSQLCmd, internalClusterAcceptCmd, @@ -63,10 +63,10 @@ var internalContainerOnStartCmd = APIEndpoint{ Get: APIEndpointAction{Handler: internalContainerOnStart}, } -var internalContainerOnStopCmd = APIEndpoint{ - Name: "containers/{id}/onstop", +var internalContainerOnPostStopCmd = APIEndpoint{ + Name: "containers/{id}/onpoststop", - Get: APIEndpointAction{Handler: internalContainerOnStop}, + Get: APIEndpointAction{Handler: internalContainerOnPostStop}, } var internalContainerOnNetworkUpCmd = APIEndpoint{ @@ -136,7 +136,7 @@ func internalContainerOnStart(d *Daemon, r *http.Request) Response { return EmptySyncResponse } -func internalContainerOnStop(d *Daemon, r *http.Request) Response { +func internalContainerOnPostStop(d *Daemon, r *http.Request) Response { id, err := strconv.Atoi(mux.Vars(r)["id"]) if err != nil { return SmartError(err) @@ -152,7 +152,7 @@ func internalContainerOnStop(d *Daemon, r *http.Request) Response { return SmartError(err) } - err = c.OnStop(target) + err = c.OnPostStop(target) if err != nil { logger.Error("The stop hook failed", log.Ctx{"container": c.Name(), "err": err}) return SmartError(err) diff --git a/lxd/container.go b/lxd/container.go index e935dde899..ccfae6a5fa 100644 --- a/lxd/container.go +++ b/lxd/container.go @@ -681,7 +681,7 @@ type container interface { // Hooks OnStart() error - OnStop(target string) error + OnPostStop(target string) error OnNetworkUp(deviceName string, hostVeth string) error // Properties diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index f46fc24887..525c2de3f5 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -1223,7 +1223,7 @@ func (c *containerLXC) initLXC(config bool) error { } } - err = lxcSetConfigItem(cc, "lxc.hook.post-stop", fmt.Sprintf("%s callhook %s %d stop", c.state.OS.ExecPath, shared.VarPath(""), c.id)) + err = lxcSetConfigItem(cc, "lxc.hook.post-stop", fmt.Sprintf("%s callhook %s %d poststop", c.state.OS.ExecPath, shared.VarPath(""), c.id)) if err != nil { return err } @@ -3050,10 +3050,12 @@ func (c *containerLXC) Shutdown(timeout time.Duration) error { return nil } -func (c *containerLXC) OnStop(target string) error { +// OnPostStop is triggered by LXC's post-stop once a container is shutdown and after the container's +// namespaces have been closed. +func (c *containerLXC) OnPostStop(target string) error { // Validate target if !shared.StringInSlice(target, []string{"stop", "reboot"}) { - logger.Error("Container sent invalid target to OnStop", log.Ctx{"container": c.Name(), "target": target}) + logger.Error("Container sent invalid target to OnPostStop", log.Ctx{"container": c.Name(), "target": target}) return fmt.Errorf("Invalid stop target: %s", target) } From lxc-bot at linuxcontainers.org Mon May 20 01:24:07 2019 From: lxc-bot at linuxcontainers.org (caglar10ur on Github) Date: Sun, 19 May 2019 18:24:07 -0700 (PDT) Subject: [lxc-devel] [go-lxc/v2] Fix minor typos Message-ID: <5ce201b7.1c69fb81.adfd4.3553SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 351 bytes Desc: not available URL: -------------- next part -------------- From aadb55d0aed2fcfeac50cf661289ea69ca61097c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=2E=C3=87a=C4=9Flar=20Onur?= Date: Sun, 19 May 2019 18:23:31 -0700 Subject: [PATCH] Fix minor typos MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: S.Çağlar Onur --- container.go | 8 ++++---- lxc_test.go | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/container.go b/container.go index 22e9c71..5cada46 100644 --- a/container.go +++ b/container.go @@ -125,7 +125,7 @@ func (c *Container) Name() string { return c.name() } -// String returns the string represantation of container. +// String returns the string representation of container. func (c *Container) String() string { c.mu.RLock() defer c.mu.RUnlock() @@ -1818,7 +1818,7 @@ func (c *Container) Migrate(cmd uint, opts MigrateOptions) error { return nil } -// AttachInterface attaches specifed netdev to the container. +// AttachInterface attaches specified netdev to the container. func (c *Container) AttachInterface(source, destination string) error { c.mu.Lock() defer c.mu.Unlock() @@ -1839,7 +1839,7 @@ func (c *Container) AttachInterface(source, destination string) error { return nil } -// DetachInterface detaches specifed netdev from the container. +// DetachInterface detaches specified netdev from the container. func (c *Container) DetachInterface(source string) error { c.mu.Lock() defer c.mu.Unlock() @@ -1857,7 +1857,7 @@ func (c *Container) DetachInterface(source string) error { return nil } -// DetachInterfaceRename detaches specifed netdev from the container and renames it. +// DetachInterfaceRename detaches specified netdev from the container and renames it. func (c *Container) DetachInterfaceRename(source, target string) error { c.mu.Lock() defer c.mu.Unlock() diff --git a/lxc_test.go b/lxc_test.go index 77d12a4..4494f5c 100644 --- a/lxc_test.go +++ b/lxc_test.go @@ -534,7 +534,7 @@ func TestControllable(t *testing.T) { defer c.Release() if !c.Controllable() { - t.Errorf("Controling the container failed...") + t.Errorf("Controlling the container failed...") } } From lxc-bot at linuxcontainers.org Mon May 20 09:31:55 2019 From: lxc-bot at linuxcontainers.org (lxc-jp on Github) Date: Mon, 20 May 2019 02:31:55 -0700 (PDT) Subject: [lxc-devel] [linuxcontainers.org/master] Add a warning about LXD security to Japanese document Message-ID: <5ce2740b.1c69fb81.e74ed.ea57SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 370 bytes Desc: not available URL: -------------- next part -------------- From bfa2eb7bd6dfd6f4add6909f89bbb40b4dbe7618 Mon Sep 17 00:00:00 2001 From: KATOH Yasufumi Date: Mon, 20 May 2019 18:30:18 +0900 Subject: [PATCH] Add a warning about LXD security to Japanese document Signed-off-by: KATOH Yasufumi --- content/lxd/getting-started-cli.ja.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/content/lxd/getting-started-cli.ja.md b/content/lxd/getting-started-cli.ja.md index 959ca70..52a8bb2 100644 --- a/content/lxd/getting-started-cli.ja.md +++ b/content/lxd/getting-started-cli.ja.md @@ -181,7 +181,7 @@ This is all done with: --> この設定は以下のように実行して行います: - sudo lxd init + lxd init ## アクセスコントロール グループメンバーシップはログイン時にのみ追加されるので、追加後にあなたのユーザセッションを閉じて再度開くか、LXD と通信したいシェル上で "newgrp lxd" コマンドを実行する必要があります + +**警告**: LXD ソケットにアクセスできる人であれば誰でも LXD を完全にコントロールできます。 +これには、ホストのデバイスやファイルシステムにアタッチする権限も含まれます。 +したがって、ホストへの root アクセスで信頼できるユーザにのみ与えられるべきです。 +さらに LXD のセキュリティについて学びたい場合は[こちら](https://lxd-ja.readthedocs.io/ja/latest/security/)をご覧ください。 + # コンテナの作成と使用 F2 + // + // This way the cluster is fully connected. foo + c.logger.Debug("[DEBUG] raft-test: elect: done") + term := &Term{ + control: c, + id: id, + leadership: leadership, + } + c.term = term + + return term + } + c.t.Fatalf("raft-test: server %s: did not acquire stable leadership", id) + + return nil +} + +// Barrier is used to wait for the cluster to settle to a stable state, where +// all in progress Apply() commands are committed across all FSM associated +// with servers that are not disconnected and all in progress snapshots and +// restores have been performed. +// +// Usually you don't wan't to concurrently keep invoking Apply() on the cluster +// raft instances while Barrier() is running. +func (c *Control) Barrier() { + // Wait for snapshots to complete. + if c.snapshotFuture != nil { + if err := c.snapshotFuture.Error(); err != nil { + c.t.Fatalf("raft-test: snapshot failed: %v", err) + } + } + + // Wait for inflight commands to be applied to the leader's FSM. + if c.term.id != "" { + // Set a relatively high timeout. + // + // TODO: let users specify the maximum amount of time a single + // Apply() to their FSM should take, and calculate this value + // accordingly. + timeout := Duration(time.Second) + + if err := c.servers[c.term.id].Barrier(timeout).Error(); err != nil { + c.t.Fatalf("raft-test: leader barrier: %v", err) + } + + // Wait for follower FSMs to catch up. + n := c.Commands(c.term.id) + events := make([]*event.Event, 0) + for id := range c.servers { + if id == c.term.id { + continue + } + // Skip disconnected followers. + if !c.network.PeerConnected(c.term.id, id) { + continue + } + event := c.watcher.WhenApplied(id, n) + events = append(events, event) + } + for _, event := range events { + <-event.Watch() + event.Ack() + } + } +} + +// Depose the current leader. +// +// When calling this method a leader must have been previously elected with +// Elect(). +// +// It must not be called if the current term has scheduled a depose action with +// Action.Depose(). +func (c *Control) Depose() { + event := event.New() + go c.deposeUponEvent(event, c.term.id, c.term.leadership) + event.Fire() + event.Block() +} + +// Commands returns the total number of command logs applied by the FSM of the +// server with the given ID. +func (c *Control) Commands(id raft.ServerID) uint64 { + return c.watcher.Commands(id) +} + +// Snapshots returns the total number of snapshots performed by the FSM of the +// server with the given ID. +func (c *Control) Snapshots(id raft.ServerID) uint64 { + return c.watcher.Snapshots(id) +} + +// Restores returns the total number of restores performed by the FSM of the +// server with the given ID. +func (c *Control) Restores(id raft.ServerID) uint64 { + return c.watcher.Restores(id) +} + +// Shutdown all raft nodes and fail the test if any of them errors out while +// doing so. +func (c *Control) shutdownServers() { + // Find the leader if there is one, and shut it down first. This should + // prevent it from getting stuck on shutdown while trying to send RPCs + // to the followers. + // + // TODO: this is arguably a workaround for a bug in the transport + // wrapper. + ids := make([]raft.ServerID, 0) + for id, r := range c.servers { + if r.State() == raft.Leader { + c.shutdownServer(id) + ids = append(ids, id) + } + } + + // Shutdown the rest. + for id := range c.servers { + hasShutdown := false + for i := range ids { + if ids[i] == id { + hasShutdown = true + break + } + } + if !hasShutdown { + c.shutdownServer(id) + ids = append(ids, id) + } + } +} + +// Shutdown a single server. +func (c *Control) shutdownServer(id raft.ServerID) { + r := c.servers[id] + future := r.Shutdown() + + // Expect the shutdown to happen within two seconds by default. + timeout := Duration(2 * time.Second) + + // Watch for errors. + ch := make(chan error, 1) + go func(future raft.Future) { + ch <- future.Error() + }(future) + + var err error + select { + case err = <-ch: + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: close: server %s: shutdown done", id)) + case <-time.After(timeout): + err = fmt.Errorf("timeout (%s)", timeout) + } + if err == nil { + return + } + + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: close: server %s: shutdown failed: %s", id, err)) + + buf := make([]byte, 1<<16) + n := runtime.Stack(buf, true) + + c.t.Errorf("\n\t%s", buf[:n]) + c.t.Fatalf("raft-test: close: error: server %s: shutdown error: %v", id, err) +} + +// Wait for the given server to acquire leadership. Returns true on success, +// false otherwise (i.e. if the timeout expires). +func (c *Control) waitLeadershipAcquired(id raft.ServerID) *election.Leadership { + timeout := maximumElectionTimeout(c.confs) * maxElectionRounds + future := c.election.Expect(id, timeout) + + c.watcher.Electing(id) + + // Reset any leader-related state on the transport of the given server + // and connect it to all other servers, letting it send them RPCs + // messages but not viceversa. E.g. for three nodes: + // + // L ---> F1 + // L ---> F2 + // + // This way we are sure we are the only server that can possibly acquire + // leadership. + c.network.Electing(id) + + // First wait for the given node to become leader. + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: elect: server %s: wait to become leader within %s", id, timeout)) + + leadership, err := future.Done() + if err != nil { + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: elect: server %s: did not become leader", id)) + } + return leadership + +} + +// Wait that the leadership just acquired by server with the given id is +// acknowledged by all other servers and they all permanently transition to the +// follower state. +func (c *Control) waitLeadershipPropagated(id raft.ServerID, leadership *election.Leadership) bool { + // The leadership propagation needs to happen within the leader lease + // timeout, otherwise the newly elected leader will step down. + timeout := maximumLeaderLeaseTimeout(c.confs) + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: elect: server %s: wait for other servers to become followers within %s", id, timeout)) + + // Get the current configuration, so we wait only for servers that are + // actually currently part of the cluster (some of them might have been + // excluded with the Servers option). + r := c.servers[id] + future := r.GetConfiguration() + if err := future.Error(); err != nil { + c.t.Fatalf("raft-test: control: server %s: failed to get configuration: %v", id, err) + } + servers := future.Configuration().Servers + + timer := time.After(timeout) + address := c.network.Address(id) + for _, server := range servers { + other := server.ID + if other == id { + continue + } + r := c.servers[server.ID] + for { + // Check that we didn't lose leadership in the meantime. + select { + case <-leadership.Lost(): + c.network.Deposing(id) + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: elect: server %s: lost leadership", id)) + return false + case <-timer: + c.t.Fatalf("raft-test: elect: server %s: followers did not settle", id) + default: + } + + // Check that this server is in follower mode, that it + // has set the elected sever as leader and that we were + // able to append at least one log entry to it (when a + // server becomes leader, it always sends a LogNoop). + if r.State() == raft.Follower && r.Leader() == address && c.network.HasAppendedLogsFromTo(id, other) { + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: elect: server %s: became follower", other)) + break + } + time.Sleep(time.Millisecond) + } + } + + return true +} + +// Return an event that gets fired when the n'th log command gets enqueued by +// the given leader server. +func (c *Control) whenCommandEnqueued(id raft.ServerID, n uint64) *event.Event { + return c.network.ScheduleEnqueueFailure(id, n) +} + +// Return an event that gets fired when the n'th log command gets appended by +// server with the given ID (which is supposed to be the leader) to all other +// servers. +func (c *Control) whenCommandAppended(id raft.ServerID, n uint64) *event.Event { + return c.network.ScheduleAppendFailure(id, n) +} + +// Return an event that gets fired when the n'th log command gets committed on +// server with the given ID (which is supposed to be the leader). +func (c *Control) whenCommandCommitted(id raft.ServerID, n uint64) *event.Event { + return c.watcher.WhenApplied(id, n) +} + +// Depose the server with the given ID when the given event fires. +func (c *Control) deposeUponEvent(event *event.Event, id raft.ServerID, leadership *election.Leadership) { + // Sanity checks. + r := c.servers[id] + if r.State() != raft.Leader { + panic(fmt.Errorf("raft-test: server %s: is not leader", id)) + } + + <-event.Watch() + + c.network.Deposing(id) + + timeout := maximumLeaderLeaseTimeout(c.confs) + + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: node %s: state: wait leadership lost (timeout=%s)", id, timeout)) + + select { + case <-leadership.Lost(): + case <-time.After(timeout): + c.t.Errorf("raft-test: server %s: error: timeout: leadership not lost", id) + c.errored = true + } + event.Ack() + + if !c.errored { + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: leadership lost", id)) + } + + c.deposing <- struct{}{} + c.deposing = nil + c.term = nil +} + +// Take a snapshot on the server with the given ID when the given event fires. +func (c *Control) snapshotUponEvent(event *event.Event, id raft.ServerID) { + <-event.Watch() + + c.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: control: take snapshot", id)) + + r := c.servers[id] + c.snapshotFuture = r.Snapshot() + + event.Ack() +} + +// Compute the maximum time a leader election should take, according to the +// given nodes configs. +func maximumElectionTimeout(confs map[raft.ServerID]*raft.Config) time.Duration { + timeout := time.Duration(0) + + for _, conf := range confs { + if conf.ElectionTimeout > timeout { + timeout = conf.ElectionTimeout + } + } + + return timeout * timeoutRandomizationFactor +} + +// Return the maximum leader lease timeout among the given nodes configs. +func maximumLeaderLeaseTimeout(confs map[raft.ServerID]*raft.Config) time.Duration { + timeout := time.Duration(0) + + for _, conf := range confs { + if conf.LeaderLeaseTimeout > timeout { + timeout = conf.LeaderLeaseTimeout + } + } + + // Multiply the timeout by three to account for randomization. + return timeout * timeoutRandomizationFactor +} + +const ( + // Assume that a leader is elected within 25 rounds. Should be safe enough. + maxElectionRounds = 25 + + // Hashicorp's raft implementation randomizes timeouts between 1x and + // 2x. Multiplying by 4x makes it sure to expire the timeout. + timeoutRandomizationFactor = 4 +) + +// WaitLeader blocks until the given raft instance sets a leader (which +// could possibly be the instance itself). +// +// It fails the test if this doesn't happen within the specified timeout. +func WaitLeader(t testing.TB, raft *raft.Raft, timeout time.Duration) { + ctx, cancel := context.WithTimeout(context.Background(), timeout) + defer cancel() + + waitLeader(ctx, t, raft) +} + +func waitLeader(ctx context.Context, t testing.TB, raft *raft.Raft) { + t.Helper() + + check := func() bool { + return raft.Leader() != "" + } + wait(ctx, t, check, 25*time.Millisecond, "no leader was set") +} + +// Poll the given function at the given internval, until it returns true, or +// the given context expires. +func wait(ctx context.Context, t testing.TB, f func() bool, interval time.Duration, message string) { + t.Helper() + + start := time.Now() + for { + select { + case <-ctx.Done(): + if err := ctx.Err(); err == context.Canceled { + return + } + t.Fatalf("%s within %s", message, time.Since(start)) + default: + } + if f() { + return + } + time.Sleep(interval) + } +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/duration.go b/vendor/github.com/CanonicalLtd/raft-test/duration.go new file mode 100644 index 0000000000..a6142aa44f --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/duration.go @@ -0,0 +1,45 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package rafttest + +import ( + "fmt" + "math" + "os" + "strconv" + "time" +) + +// Duration is a convenience to scale the given duration according to the +// GO_RAFT_TEST_LATENCY environment variable. +func Duration(duration time.Duration) time.Duration { + factor := 1.0 + if env := os.Getenv("GO_RAFT_TEST_LATENCY"); env != "" { + var err error + factor, err = strconv.ParseFloat(env, 64) + if err != nil { + panic(fmt.Sprintf("invalid value '%s' for GO_RAFT_TEST_LATENCY", env)) + } + } + return scaleDuration(duration, factor) +} + +func scaleDuration(duration time.Duration, factor float64) time.Duration { + if factor == 1.0 { + return duration + } + + return time.Duration((math.Ceil(float64(duration) * factor))) +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/fsm.go b/vendor/github.com/CanonicalLtd/raft-test/fsm.go new file mode 100644 index 0000000000..cd3bf5df33 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/fsm.go @@ -0,0 +1,60 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package rafttest + +import ( + "io" + + "github.com/hashicorp/raft" +) + +// FSM create a dummy FSMs. +func FSM() raft.FSM { + return &fsm{} +} + +// FSMs creates the given number of dummy FSMs. +func FSMs(n int) []raft.FSM { + fsms := make([]raft.FSM, n) + for i := range fsms { + fsms[i] = FSM() + } + return fsms +} + +// fsm is a dummy raft finite state machine that does nothing and +// always no-ops. +type fsm struct{} + +// Apply always return a nil error without doing anything. +func (f *fsm) Apply(*raft.Log) interface{} { return nil } + +// Snapshot always return a dummy snapshot and no error without doing +// anything. +func (f *fsm) Snapshot() (raft.FSMSnapshot, error) { return &fsmSnapshot{}, nil } + +// Restore always return a nil error without reading anything from +// the reader. +func (f *fsm) Restore(io.ReadCloser) error { return nil } + +// fsmSnapshot a dummy implementation of an fsm snapshot. +type fsmSnapshot struct{} + +// Persist always return a nil error without writing anything +// to the sink. +func (s *fsmSnapshot) Persist(sink raft.SnapshotSink) error { return nil } + +// Release is a no-op. +func (s *fsmSnapshot) Release() {} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/election/future.go b/vendor/github.com/CanonicalLtd/raft-test/internal/election/future.go new file mode 100644 index 0000000000..5adcf16f8f --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/election/future.go @@ -0,0 +1,61 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package election + +import ( + "fmt" + "time" + + "github.com/hashicorp/raft" +) + +// Future represents a request to acquire leadership that will eventually +// succeed or fail. +type Future struct { + // ID of the raft server that should acquire leadership. + id raft.ServerID + + // If leadership is not acquire within this timeout, the future fails. + timeout time.Duration + + // Notification about leadership being acquired. + acquiredCh chan struct{} + + // Notification about leadership being lost. + lostCh chan struct{} +} + +// Creates a new leadership future of the given server. +func newFuture(id raft.ServerID, timeout time.Duration) *Future { + future := &Future{ + id: id, + timeout: timeout, + acquiredCh: make(chan struct{}), + lostCh: make(chan struct{}), + } + return future +} + +// Done returns a Leadership object if leadership was acquired withing the +// timeout, or an error otherwise. +func (f *Future) Done() (*Leadership, error) { + select { + case <-f.acquiredCh: + leadership := newLeadership(f.id, f.lostCh) + return leadership, nil + case <-time.After(f.timeout): + return nil, fmt.Errorf("server %s: leadership not acquired within %s", f.id, f.timeout) + } +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/election/leadership.go b/vendor/github.com/CanonicalLtd/raft-test/internal/election/leadership.go new file mode 100644 index 0000000000..b54885de92 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/election/leadership.go @@ -0,0 +1,43 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package election + +import ( + "github.com/hashicorp/raft" +) + +// Leadership represents the leadership acquired by a server that was elected +// as leader. It exposes methods to be notified about its loss, with the server +// stepping down as leader. +type Leadership struct { + // ID of the raft server that acquired the leadership. + id raft.ServerID + + // Notification about leadership being lost. + lostCh chan struct{} +} + +// Create new leadership object. +func newLeadership(id raft.ServerID, lostCh chan struct{}) *Leadership { + return &Leadership{ + id: id, + lostCh: lostCh, + } +} + +// Lost returns a channel that gets closed when leadership is lost. +func (l *Leadership) Lost() chan struct{} { + return l.lostCh +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/election/notifier.go b/vendor/github.com/CanonicalLtd/raft-test/internal/election/notifier.go new file mode 100644 index 0000000000..7ec4213f30 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/election/notifier.go @@ -0,0 +1,149 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package election + +import ( + "fmt" + "time" + + "github.com/hashicorp/go-hclog" + "github.com/hashicorp/raft" +) + +// Notifiy about leadership changes in a single raft server. +type notifier struct { + // For debugging raft-test itself or its consumers. + logger hclog.Logger + + // ID of the raft server we're observing. + id raft.ServerID + + // Reference to the Config.NotifyCh object set in this server's Config. + notifyCh chan bool + + // Channel used to tell the notification loop to expect the server to + // acquire leadership. The leadership future sent to this channel will + // be used both for notifying that leadership was acquired. + futureCh chan *Future + + // Channel used to tell the notification loop to ignore any + // notification received from the notifyCh. + ignoreCh chan struct{} + + // Stop observing leadership changes when this channel gets closed. + shutdownCh chan struct{} +} + +// Create a new notifier. +func newNotifier(logger hclog.Logger, id raft.ServerID, notifyCh chan bool) *notifier { + observer := ¬ifier{ + logger: logger, + id: id, + notifyCh: notifyCh, + futureCh: make(chan *Future), + ignoreCh: make(chan struct{}), + shutdownCh: make(chan struct{}), + } + go observer.start() + return observer +} + +// Ignore any notifications received on the notifyCh. +func (n *notifier) Ignore() { + close(n.ignoreCh) +} + +// Close stops observing leadership changes. +func (n *notifier) Close() { + n.shutdownCh <- struct{}{} + <-n.shutdownCh +} + +// Acquired returns a Leadership object when the server acquires leadership, or +// an error if the timeout expires. +// +// It must be called before this server has any chance to become leader +// (e.g. it's disconnected from the other servers). +// +// Once called, it must not be called again until leadership is lost. +func (n *notifier) Acquired(timeout time.Duration) *Future { + future := newFuture(n.id, timeout) + n.futureCh <- future + return future +} + +// Start observing leadership changes using the notify channel of our server +// and eed notification to our consumers. +// +// The loop will be terminated once the stopCh is closed. +func (n *notifier) start() { + // Record the last leadership change observation. For asserting that a + // leadership lost notification always follows a leadership acquired + // one. + var last bool + + // Record the last request for leadership change for this server, if + // any. + var future *Future + for { + select { + case f := <-n.futureCh: + if future != nil { + panic(fmt.Sprintf("server %s: duplicate leadership request", n.id)) + } + future = f + case acquired := <-n.notifyCh: + ignore := false + select { + case <-n.ignoreCh: + // Just drop the notification on the floor. + ignore = true + default: + } + if ignore { + break + } + if future == nil { + panic(fmt.Sprintf("server %s: unexpected leadership change", n.id)) + } + verb := "" + var ch chan struct{} + if acquired { + verb = "acquired" + ch = future.acquiredCh + } else { + verb = "lost" + ch = future.lostCh + future = nil + + } + if acquired == last { + panic(fmt.Sprintf("server %s %s leadership twice in a row", n.id, verb)) + } + last = acquired + n.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: leadership: %s", n.id, verb)) + select { + case <-ch: + panic(fmt.Sprintf("server %s: duplicate leadership %s notification", n.id, verb)) + default: + close(ch) + } + case <-n.shutdownCh: + n.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: leadership: stop watching", n.id)) + close(n.shutdownCh) + return + } + } +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/election/tracker.go b/vendor/github.com/CanonicalLtd/raft-test/internal/election/tracker.go new file mode 100644 index 0000000000..e8090380f1 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/election/tracker.go @@ -0,0 +1,112 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package election + +import ( + "fmt" + "sync" + "time" + + "github.com/hashicorp/go-hclog" + "github.com/hashicorp/raft" +) + +// Tracker consumes the raft.Config.NotifyCh set on each server of a cluster, +// tracking when elections occur. +type Tracker struct { + // For debugging raft-test itself or its consumers. + logger hclog.Logger + + // Watchers for individual servers. + // + // Note that this map is not protected by a mutex, since it should be + // written once when the cluster is created, and never written again. + observers map[raft.ServerID]*notifier + + // Flag indicating if Acquired() has been called on this Observer. It's + // used to as sanity check that Add() is not called after the first + // call to Acquired(). + observing bool + + // Current leadership future, if any. It's used as sanity check to + // prevent further leadership requests. + future *Future + + // Serialize access to internal state. + mu sync.Mutex +} + +// NewTracker creates a new Tracker for watching leadership +// changes in a raft cluster. +func NewTracker(logger hclog.Logger) *Tracker { + return &Tracker{ + logger: logger, + observers: make(map[raft.ServerID]*notifier), + } +} + +// Ignore stops propagating leadership change notifications, which will be +// simply dropped on the floor. Should be called before the final Close(). +func (t *Tracker) Ignore() { + for _, observer := range t.observers { + observer.Ignore() + } +} + +// Close stops watching for leadership changes in the cluster. +func (t *Tracker) Close() { + for _, observer := range t.observers { + observer.Close() + } +} + +// Track leadership changes on the server with the given ID using the given +// Config.NotifyCh. +func (t *Tracker) Track(id raft.ServerID, notifyCh chan bool) { + if t.observing { + panic("can't track new server while observing") + } + if _, ok := t.observers[id]; ok { + panic(fmt.Sprintf("an observer for server %s is already registered", id)) + } + t.observers[id] = newNotifier(t.logger, id, notifyCh) +} + +// Expect returns an election Future object whose Done() method will return +// a Leadership object when the server with the given ID acquires leadership, +// or an error if the given timeout expires. +// +// It must be called before this server has any chance to become leader +// (e.g. it's disconnected from the other servers). +// +// Once called, it must not be called again until leadership is lost. +func (t *Tracker) Expect(id raft.ServerID, timeout time.Duration) *Future { + t.mu.Lock() + defer t.mu.Unlock() + t.observing = true + + if t.future != nil { + select { + case <-t.future.lostCh: + // Leadership was acquired, but has been lost, so let's proceed. + t.future = nil + default: + panic(fmt.Sprintf("server %s has already requested leadership", t.future.id)) + } + } + + t.future = t.observers[id].Acquired(timeout) + return t.future +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/event/event.go b/vendor/github.com/CanonicalLtd/raft-test/internal/event/event.go new file mode 100644 index 0000000000..9f73987ab4 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/event/event.go @@ -0,0 +1,54 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package event + +// An Event that occurrs when certain log command is either enqueued, appended +// or committed. Events may be fired in the transport layer (i.e. in the +// eventTransport wrappers) or in the state machine layer (i.e. in the eventFSM +// wrapper). +type Event struct { + fireCh chan struct{} + ackCh chan struct{} +} + +// New creates a new event. +func New() *Event { + return &Event{ + fireCh: make(chan struct{}), + ackCh: make(chan struct{}), + } +} + +// Watch the event. Return a channel that gets closed when the event gets +// fired. +func (e *Event) Watch() <-chan struct{} { + return e.fireCh +} + +// Fire the event. A watcher on the event will be awaken. +func (e *Event) Fire() { + close(e.fireCh) +} + +// Block until the watcher of the event has acknowledged that the event has +// been handled. +func (e *Event) Block() { + <-e.ackCh +} + +// Ack acknowledges that the event has been handled. +func (e *Event) Ack() { + close(e.ackCh) +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/fsms/watcher.go b/vendor/github.com/CanonicalLtd/raft-test/internal/fsms/watcher.go new file mode 100644 index 0000000000..f10aa1bffd --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/fsms/watcher.go @@ -0,0 +1,77 @@ +// Copyright 2017 Canonical Ld. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package fsms + +import ( + "github.com/CanonicalLtd/raft-test/internal/event" + "github.com/hashicorp/go-hclog" + "github.com/hashicorp/raft" +) + +// Watcher watches all FSMs of a cluster, firing events at certain moments. +type Watcher struct { + logger hclog.Logger + + // FSM wrappers. + fsms map[raft.ServerID]*fsmWrapper +} + +// New create a new FSMs watcher for watching the underlying FSMs. +func New(logger hclog.Logger) *Watcher { + return &Watcher{ + logger: logger, + fsms: make(map[raft.ServerID]*fsmWrapper), + } +} + +// Add an FSM to the watcher. Returns an FSM that wraps the given FSM with +// instrumentation for firing events. +func (w *Watcher) Add(id raft.ServerID, fsm raft.FSM) raft.FSM { + w.fsms[id] = newFSMWrapper(w.logger, id, fsm) + return w.fsms[id] +} + +// WhenApplied returns an event that will fire when the n'th command log for +// the term is applied on the FSM associated with the server with the given +// ID. It's that such server is currently the leader. +func (w *Watcher) WhenApplied(id raft.ServerID, n uint64) *event.Event { + return w.fsms[id].whenApplied(n) +} + +// Commands returns the total number of command logs applied by the FSM of +// the server with the given ID. +func (w *Watcher) Commands(id raft.ServerID) uint64 { + return w.fsms[id].Commands() +} + +// Snapshots returns the total number of snapshots performed by the FSM of the +// server with the given ID. +func (w *Watcher) Snapshots(id raft.ServerID) uint64 { + return w.fsms[id].Snapshots() +} + +// Restores returns the total number of restores performed by the FSM of the +// server with the given ID. +func (w *Watcher) Restores(id raft.ServerID) uint64 { + return w.fsms[id].Restores() +} + +// Electing must be called whenever the given server is about to transition to +// the leader state, and before any new command log is applied. +// +// It resets the internal state of the FSN, such the the commands counter. +func (w *Watcher) Electing(id raft.ServerID) { + w.fsms[id].electing() +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/fsms/wrapper.go b/vendor/github.com/CanonicalLtd/raft-test/internal/fsms/wrapper.go new file mode 100644 index 0000000000..ea86a49550 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/fsms/wrapper.go @@ -0,0 +1,188 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package fsms + +import ( + "encoding/binary" + "fmt" + "io" + "sync" + + "github.com/CanonicalLtd/raft-test/internal/event" + "github.com/hashicorp/go-hclog" + "github.com/hashicorp/raft" + "github.com/pkg/errors" +) + +// Wraps a raft.FSM, adding control on logs, snapshots and restores. +type fsmWrapper struct { + logger hclog.Logger + + // ID of of the raft server associated with this FSM. + id raft.ServerID + + // Wrapped FSM + fsm raft.FSM + + // Total number of commands applied by this FSM. + commands uint64 + + // Total number of snapshots performed on this FSM. + snapshots uint64 + + // Total number of restores performed on this FSM. + restores uint64 + + // Events that should be fired when a certain command log is events. + events map[uint64][]*event.Event + + mu sync.RWMutex +} + +func newFSMWrapper(logger hclog.Logger, id raft.ServerID, fsm raft.FSM) *fsmWrapper { + return &fsmWrapper{ + logger: logger, + id: id, + fsm: fsm, + events: make(map[uint64][]*event.Event), + } +} + +func (f *fsmWrapper) Apply(log *raft.Log) interface{} { + result := f.fsm.Apply(log) + + f.mu.Lock() + f.commands++ + f.mu.Unlock() + + f.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: fsm %s: applied %d", f.id, f.commands)) + if events, ok := f.events[f.commands]; ok { + for _, event := range events { + event.Fire() + event.Block() + } + } + + return result +} + +// Snapshot always return a dummy snapshot and no error without doing +// anything. +func (f *fsmWrapper) Snapshot() (raft.FSMSnapshot, error) { + snapshot, err := f.fsm.Snapshot() + + if snapshot != nil { + f.mu.Lock() + f.snapshots++ + snapshot = &fsmSnapshotWrapper{ + commands: f.commands, + snapshot: snapshot, + } + f.mu.Unlock() + } + + return snapshot, err +} + +// Restore always return a nil error without reading anything from +// the reader. +func (f *fsmWrapper) Restore(reader io.ReadCloser) error { + if err := binary.Read(reader, binary.LittleEndian, &f.commands); err != nil { + return errors.Wrap(err, "failed to restore commands count") + } + if err := f.fsm.Restore(reader); err != nil { + return errors.Wrap(err, "failed to perform restore on user's FSM") + } + + if events, ok := f.events[f.commands]; ok { + for _, event := range events { + event.Fire() + event.Block() + } + } + + f.mu.Lock() + f.restores++ + f.mu.Unlock() + + return nil +} + +// This method must be called whenever the server associated with this FSM is +// about to transition to the leader state, and before any new command log is +// applied. +// +// It resets the internal state of the fsm, such as the list of applied command +// logs and the scheduled events. +func (f *fsmWrapper) electing() { + f.mu.Lock() + defer f.mu.Unlock() + for n := range f.events { + delete(f.events, n) + } +} + +// Return an event that will fire when the n'th command log for the term is +// applied on this FSM. It's assumed that this FSM is associated with the +// current leader. +func (f *fsmWrapper) whenApplied(n uint64) *event.Event { + e := event.New() + f.mu.RLock() + defer f.mu.RUnlock() + if f.commands >= n { + // Fire immediately. + go e.Fire() + } else { + _, ok := f.events[n] + if !ok { + f.events[n] = make([]*event.Event, 0) + } + f.events[n] = append(f.events[n], e) + } + return e +} + +// Return the total number of command logs applied by this FSM. +func (f *fsmWrapper) Commands() uint64 { + return f.commands +} + +// Return the total number of snapshots performed by this FSM. +func (f *fsmWrapper) Snapshots() uint64 { + return f.snapshots +} + +// Return the total number of restores performed by this FSM. +func (f *fsmWrapper) Restores() uint64 { + return f.restores +} + +type fsmSnapshotWrapper struct { + commands uint64 + snapshot raft.FSMSnapshot +} + +func (s *fsmSnapshotWrapper) Persist(sink raft.SnapshotSink) error { + // Augment the snapshot with the current command count. + if err := binary.Write(sink, binary.LittleEndian, s.commands); err != nil { + return errors.Wrap(err, "failed to augment snapshot with commands count") + } + if err := s.snapshot.Persist(sink); err != nil { + return errors.Wrap(err, "failed to perform snapshot on user's FSM") + } + return nil +} + +func (s *fsmSnapshotWrapper) Release() {} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/logging/logger.go b/vendor/github.com/CanonicalLtd/raft-test/internal/logging/logger.go new file mode 100644 index 0000000000..07ec09f63c --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/logging/logger.go @@ -0,0 +1,50 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package logging + +import ( + "testing" + + "github.com/hashicorp/logutils" + "github.com/hashicorp/go-hclog" +) + +// New returns a standard hclog.Logger that will write entries at or above the +// specified level to the testing log. +func New(t testing.TB, level logutils.LogLevel) hclog.Logger { + filter := &logutils.LevelFilter{ + Levels: []logutils.LogLevel{"DEBUG", "WARN", "ERROR", "INFO"}, + MinLevel: level, + Writer: &testingWriter{t}, + } + + return hclog.New(&hclog.LoggerOptions{ + Name: "raft-test", + Output: filter, + }) +} + +// Implement io.Writer and forward what it receives to a +// testing logger. +type testingWriter struct { + t testing.TB +} + +// Write a single log entry. It's assumed that p is always a \n-terminated UTF +// string. +func (w *testingWriter) Write(p []byte) (n int, err error) { + w.t.Logf(string(p)) + return len(p), nil +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/network/logs.go b/vendor/github.com/CanonicalLtd/raft-test/internal/network/logs.go new file mode 100644 index 0000000000..df88af9e31 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/network/logs.go @@ -0,0 +1,76 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package network + +import ( + "fmt" + "strings" + + "github.com/hashicorp/raft" +) + +// Return a string representation of the given log entries. +func stringifyLogs(logs []*raft.Log) string { + n := len(logs) + description := fmt.Sprintf("%d ", n) + if n == 1 { + description += "entry" + } else { + description += "entries" + } + + if n > 0 { + entries := make([]string, n) + for i, log := range logs { + name := "Other" + switch log.Type { + case raft.LogCommand: + name = "Command" + case raft.LogNoop: + name = "Noop" + } + entries[i] = fmt.Sprintf("%s:term=%d,index=%d", name, log.Term, log.Index) + } + description += fmt.Sprintf(" [%s]", strings.Join(entries, " ")) + } + + return description +} + +// This function takes a set of log entries that have been successfully +// appended to a peer and filters out any log entry with an older term relative +// to the others. +// +// The returned entries are guaranted to have the same term and that term is +// the highest among the ones in this batch. +func filterLogsWithOlderTerms(logs []*raft.Log) []*raft.Log { + // Find the highest term. + var highestTerm uint64 + for _, log := range logs { + if log.Term > highestTerm { + highestTerm = log.Term + } + } + + // Discard any log with an older term than the highest one. + filteredLogs := make([]*raft.Log, 0) + for _, log := range logs { + if log.Term == highestTerm { + filteredLogs = append(filteredLogs, log) + } + } + + return filteredLogs +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/network/network.go b/vendor/github.com/CanonicalLtd/raft-test/internal/network/network.go new file mode 100644 index 0000000000..b41cc6e0a2 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/network/network.go @@ -0,0 +1,147 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package network + +import ( + "fmt" + + "github.com/CanonicalLtd/raft-test/internal/event" + "github.com/hashicorp/go-hclog" + "github.com/hashicorp/raft" +) + +// Network provides control over all transports of a cluster, injecting +// disconnections and failures. +type Network struct { + logger hclog.Logger + + // Transport wrappers. + transports map[raft.ServerID]*eventTransport +} + +// New create a new network for controlling the underlying transports. +func New(logger hclog.Logger) *Network { + return &Network{ + logger: logger, + transports: make(map[raft.ServerID]*eventTransport), + } +} + +// Add a new transport to the network. Returns a transport that wraps the given +// transport with instrumentation to inject disconnections and failures. +func (n *Network) Add(id raft.ServerID, trans raft.Transport) raft.Transport { + transport := newEventTransport(n.logger, id, trans) + + for _, other := range n.transports { + transport.AddPeer(other) + other.AddPeer(transport) + } + + n.transports[id] = transport + return transport +} + +// Electing resets any leader-related state in the transport associated with +// given server ID (such as the track of logs appended by the peers), and it +// connects the transport to all its peers, enabling it to send them RPCs. It +// must be called whenever the server associated with this transport is about +// to transition to the leader state, and before any append entries RPC is +// made. +func (n *Network) Electing(id raft.ServerID) { + n.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: establish outbound connection to all other nodes", id)) + + // Sanity check that the network is fully disconnected at this time. + for id, transport := range n.transports { + if transport.Connected() { + panic(fmt.Sprintf("expected a fully disconected network, but server %s is connected", id)) + } + } + + transport := n.transports[id] + transport.Electing() +} + +// Deposing disables connectivity from the transport of the server with the +// given ID to all its peers, allowing only append entries RPCs for peers that +// are lagging behind in terms of applied logs to be performed. +func (n *Network) Deposing(id raft.ServerID) { + n.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: dropping outbound connection to all other nodes", id)) + n.transports[id].Deposing() +} + +// ConnectAllServers establishes full cluster connectivity after an +// election. The given ID is the one of the leader, which is already connected. +func (n *Network) ConnectAllServers(id raft.ServerID) { + // Sanity check that the network is fully disconnected at this time. + for other, transport := range n.transports { + if other == id { + continue + } + transport.peers.Connect() + } +} + +// Disconnect disables connectivity from the transport of the leader +// server with the given ID to the peer with the given ID. +func (n *Network) Disconnect(id, follower raft.ServerID) { + n.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: disconnecting follower %s", id, follower)) + n.transports[id].Disconnect(follower) +} + +// Reconnect re-enables connectivity from the transport of the leader +// server with the given ID to the peer with the given ID. +func (n *Network) Reconnect(id, follower raft.ServerID) { + n.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: reconnecting follower %s", id, follower)) + n.transports[id].Reconnect(follower) +} + +// PeerConnected returns whether the peer with the given server ID is connected +// with the transport of the server with the given ID. +func (n *Network) PeerConnected(id, peer raft.ServerID) bool { + return n.transports[id].PeerConnected(peer) +} + +// Address returns the address of the server with the given id. +func (n *Network) Address(id raft.ServerID) raft.ServerAddress { + return n.transports[id].LocalAddr() +} + +// HasAppendedLogsFromTo returns true if at least one log entry has been appended +// by server with id1 to server with id2. +// +// It is assumed that id1 is a leader that has just been elected and has been +// trying to append a noop log to all its followers. +func (n *Network) HasAppendedLogsFromTo(id1, id2 raft.ServerID) bool { + transport := n.transports[id1] + return transport.HasAppendedLogsTo(id2) +} + +// ScheduleEnqueueFailure will make all followers of the given server fail when +// the leader tries to append the n'th log command. Return an event that will +// fire when all of them have failed and will block them all until +// acknowledged. +func (n *Network) ScheduleEnqueueFailure(id raft.ServerID, command uint64) *event.Event { + transport := n.transports[id] + return transport.ScheduleEnqueueFailure(command) +} + +// ScheduleAppendFailure will make all followers of the given leader server +// append the n'th log command sent by the leader, but they will fail to +// acknowledge the leader about it. Return an event that will fire when all of +// them have failed and will block them all until acknowledged. +func (n *Network) ScheduleAppendFailure(id raft.ServerID, command uint64) *event.Event { + transport := n.transports[id] + return transport.ScheduleAppendFailure(command) +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/network/peers.go b/vendor/github.com/CanonicalLtd/raft-test/internal/network/peers.go new file mode 100644 index 0000000000..a386c2982a --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/network/peers.go @@ -0,0 +1,307 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package network + +import ( + "fmt" + "sync" + + "github.com/hashicorp/raft" +) + +// Small wrapper around a map of raft.ServerID->peer, offering concurrency +// safety. This bit of information is not on faultyTransport directly, since it +// needs to be shared between faultyTransport and faultyPipeline. +type peers struct { + peers map[raft.ServerID]*peer + mu sync.RWMutex +} + +// Create a new empty peers map. +func newPeers() *peers { + return &peers{ + peers: make(map[raft.ServerID]*peer), + } +} + +// Add a new peer for the given source and target server IDs. +func (p *peers) Add(source, target raft.ServerID) { + p.peers[target] = newPeer(source, target) +} + +// Get the peer with the given ID. +func (p *peers) Get(id raft.ServerID) *peer { + // Sinces peers entries are inserted at initialization time by the + // Cluster() function, and currently they never change afterwise, + // there's no need to protect this method with the mutex. + return p.peers[id] +} + +// Return all the peers +func (p *peers) All() map[raft.ServerID]*peer { + // Sinces peers entries are inserted at initialization time by the + // Cluster() function, and currently they never change afterwise, + // there's no need to protect this method with the mutex. + return p.peers +} + +// Enable connectivity to all the peers in this map. +func (p *peers) Connect() { + p.mu.Lock() + defer p.mu.Unlock() + + for _, peer := range p.peers { + peer.Connect() + } +} + +// Returns true if all peers are connected, false otherwise. +// +// It panics if some nodes are connected and others are not. +func (p *peers) Connected() bool { + p.mu.Lock() + defer p.mu.Unlock() + + connected := false + for id, peer := range p.peers { + if !connected { + connected = peer.Connected() + } else if !peer.Connected() { + panic(fmt.Sprintf("server %s is not not connected while some others are", id)) + } + } + return connected +} + +// Disable connectivity to all the peers in this map. However allow for peers +// that are lagging behind in terms of received entries to still receive +// AppendEntries RPCs. +func (p *peers) SoftDisconnect() { + p.mu.Lock() + defer p.mu.Unlock() + + for _, peer := range p.peers { + peer.SoftDisconnect() + } +} + +// Whether the given target peer is both disconnected from its source +// transport, and it's not syncing logs with other peers (i.e. either they are +// at the same index of the peer with the highest index of appended logs, or +// the peer has been hard-disconnected) +func (p *peers) DisconnectedAndNotSyncing(id raft.ServerID) bool { + p.mu.RLock() + defer p.mu.RUnlock() + + for _, peer := range p.peers { + peer.mu.RLock() + defer peer.mu.RUnlock() + } + + this := p.peers[id] + if this.connected { + return false + } + + if !this.allowSyncing { + return true + } + + count := this.LogsCount() + + for _, other := range p.peers { + if other.target == this.target { + continue + } + if count < other.LogsCount() { + return false + } + } + + return true +} + +// Hold information about a single peer server that a faultyTransport is +// sending RPCs to. +type peer struct { + // Server ID of the server sending RPCs to the peer. + source raft.ServerID + + // Server ID of the peer server. + target raft.ServerID + + // Whether connectivity is up. The transport can send RPCs to the peer + // server only if this value is true. + connected bool + + // Whether to allow appending entries to this peer even if the + // connected field is false. Used for bringing the logs appended by a + // peer in sync with the others. + allowSyncing bool + + // Logs successfully appended to this peer since the server of the + // transport we're associated with has acquired leadership. This keeps + // only logs tagged with the same term the leader was elected at. + logs []*raft.Log + + // Serialize access to internal state. + mu sync.RWMutex +} + +// Create a new peer for the given server. +func newPeer(source, target raft.ServerID) *peer { + return &peer{ + target: target, + logs: make([]*raft.Log, 0), + } +} + +// Enable connectivity between the source transport and the target peer. +func (p *peer) Connect() { + p.mu.Lock() + defer p.mu.Unlock() + if p.connected { + panic(fmt.Sprintf("server %s is already connected with server %s", p.source, p.target)) + } + p.connected = true + p.allowSyncing = false +} + +// Disable connectivity between the source transport and the target +// peer. +func (p *peer) Disconnect() { + p.mu.Lock() + defer p.mu.Unlock() + if !p.connected { + panic(fmt.Sprintf("server %s is already disconnected from server %s", p.source, p.target)) + } + p.connected = false + p.allowSyncing = false +} + +// Re-enables connectivity between the source transport and the target +// peer. +func (p *peer) Reconnect() { + p.mu.Lock() + defer p.mu.Unlock() + if p.connected { + panic(fmt.Sprintf("server %s is already connected with server %s", p.source, p.target)) + } + p.connected = true + p.allowSyncing = false +} + +// Disable connectivity between the source transport and the target +// peer. However allow for peers that are lagging behind in terms of received +// entries to still receive AppendEntries RPCs. +func (p *peer) SoftDisconnect() { + p.mu.Lock() + defer p.mu.Unlock() + if !p.connected { + panic(fmt.Sprintf("server %s is already disconnected from server %s", p.source, p.target)) + } + p.connected = false + p.allowSyncing = true +} + +// Return whether this source transport is connected to the target peer. +func (p *peer) Connected() bool { + p.mu.RLock() + defer p.mu.RUnlock() + return p.connected +} + +// Reset all recorded logs. Should be called when a new server is elected. +func (p *peer) ResetLogs() { + p.logs = p.logs[:0] +} + +// This method updates the logs that the peer successfully appended. It must be +// called whenever the transport is confident that logs have been +// appended. There are two cases: +// +// - Transport.AppendEntries(): this is synchronous so UpdateLogs() can be invoked +// as soon as the AppendEntries() call returns. +// +// - AppendPipeline.AppendEntries(): this is asynchronous, so UpdateLogs() should +// be invoked only when the AppendFuture returned by AppendEntries() completes. +// +// In practice, the current implementation of faultyTransport and +// faultyPipeline is a bit sloppy about the above rules, since we can make some +// assumptions about the flow of entries. See comments in faultyTransport and +// faultyPipeline for more details. +func (p *peer) UpdateLogs(logs []*raft.Log) { + p.mu.Lock() + defer p.mu.Unlock() + + if len(logs) == 0 { + return // Nothing to do. + } + + // Discard any log with an older term (relative to the others). + newLogs := filterLogsWithOlderTerms(logs) + + // If no logs have been received yet, just append everything. + if len(p.logs) == 0 { + p.logs = newLogs + return + } + + // Check if we have stored entries for older terms, and if so, discard + // them. + // + // We only need to check the first entry, because we always store + // entries that all have the same term. + if p.logs[0].Term < newLogs[0].Term { + p.logs = p.logs[:0] + } + + // Append new logs that aren't duplicates. + for _, newLog := range newLogs { + duplicate := false + for _, log := range p.logs { + if newLog.Index == log.Index { + duplicate = true + break + } + } + if duplicate { + continue + } + p.logs = append(p.logs, newLog) + } +} + +// Return then number of all logs appended so far to this peer. +func (p *peer) LogsCount() int { + p.mu.RLock() + defer p.mu.RUnlock() + + return len(p.logs) +} + +// Return then number of command logs appended so far to this peer. +func (p *peer) CommandLogsCount() uint64 { + p.mu.RLock() + defer p.mu.RUnlock() + + n := uint64(0) + for _, log := range p.logs { + if log.Type == raft.LogCommand { + n++ + } + } + return n +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/network/pipeline.go b/vendor/github.com/CanonicalLtd/raft-test/internal/network/pipeline.go new file mode 100644 index 0000000000..29d77e56f8 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/network/pipeline.go @@ -0,0 +1,166 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package network + +import ( + "fmt" + "time" + + "github.com/hashicorp/go-hclog" + "github.com/hashicorp/raft" +) + +// Wrap a regular raft.AppendPipeline, adding support for triggering events at +// specific times. +type eventPipeline struct { + logger hclog.Logger + + // Server ID sending RPCs. + source raft.ServerID + + // Server ID this pipeline is sending RPCs to. + target raft.ServerID + + // Regular pipeline that we are wrapping. + pipeline raft.AppendPipeline + + // All other peers connected to our transport. Syncing logs after a + // disconnection. + peers *peers + + // Fault that should happen in this transport during a term. + schedule *schedule + + // If non-zero, the pipeline will artificially return an error to its + // consumer when firing the response of a request whose entries contain + // a log with this index. This happens after the peer as actually + // appended the request's entries and it effectively simulates a + // follower disconnecting before it can acknowledge the leader of a + // successful request. + failure uint64 + + // To stop the pipeline. + shutdownCh chan struct{} +} + +// AppendEntries is used to add another request to the pipeline. +// The send may block which is an effective form of back-pressure. +func (p *eventPipeline) AppendEntries( + args *raft.AppendEntriesRequest, resp *raft.AppendEntriesResponse) (raft.AppendFuture, error) { + + p.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: pipeline: append to %s: %s", p.source, p.target, stringifyLogs(args.Entries))) + + peer := p.peers.Get(p.target) + faulty := false + if p.schedule != nil { + n := peer.CommandLogsCount() + args, faulty = p.schedule.FilterRequest(n, args) + if faulty && p.schedule.IsEnqueueFault() { + p.logger.Debug(fmt.Sprintf( + "[DEBUG] raft-test: server %s: pipeline: append to: %s: enqueue fault: command %d", p.source, p.target, p.schedule.Command())) + } + } + + if p.peers.DisconnectedAndNotSyncing(p.target) { + p.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: pipeline: append to %s: not connected", p.source, p.target)) + return nil, fmt.Errorf("cannot reach server %s", p.target) + } + + if faulty && p.schedule.IsAppendFault() { + p.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: pipeline: append to %s: append fault: command %d", p.source, p.target, p.schedule.n)) + p.failure = args.Entries[0].Index + } + + future, err := p.pipeline.AppendEntries(args, resp) + if err != nil { + return nil, err + } + peer.UpdateLogs(args.Entries) + + if faulty && p.schedule.IsEnqueueFault() { + p.schedule.OccurredOn(p.target) + p.schedule.event.Block() + return nil, fmt.Errorf("cannot reach server %s", p.target) + } + + return &appendFutureWrapper{ + id: p.target, + future: future, + }, nil +} + +// Consumer returns a channel that can be used to consume +// response futures when they are ready. +func (p *eventPipeline) Consumer() <-chan raft.AppendFuture { + ch := make(chan raft.AppendFuture) + + go func() { + for { + select { + case future := <-p.pipeline.Consumer(): + entries := future.Request().Entries + fail := false + if len(entries) > 0 && entries[0].Index == p.failure { + fail = true + } + if fail { + p.schedule.OccurredOn(p.target) + p.schedule.event.Block() + future = &appendFutureWrapper{id: p.target, future: future, failing: true} + } + ch <- future + case <-p.shutdownCh: + return + } + } + }() + return ch + +} + +// Close closes the pipeline and cancels all inflight RPCs +func (p *eventPipeline) Close() error { + err := p.pipeline.Close() + close(p.shutdownCh) + return err +} + +type appendFutureWrapper struct { + id raft.ServerID + future raft.AppendFuture + failing bool +} + +func (f *appendFutureWrapper) Error() error { + if f.failing { + return fmt.Errorf("cannot reach server %s", f.id) + } + return f.future.Error() +} + +func (f *appendFutureWrapper) Start() time.Time { + return f.future.Start() +} + +func (f *appendFutureWrapper) Request() *raft.AppendEntriesRequest { + return f.future.Request() +} +func (f *appendFutureWrapper) Response() *raft.AppendEntriesResponse { + response := f.future.Response() + if f.failing { + response.Success = false + } + return response +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/network/schedule.go b/vendor/github.com/CanonicalLtd/raft-test/internal/network/schedule.go new file mode 100644 index 0000000000..3b5c0293c5 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/network/schedule.go @@ -0,0 +1,178 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package network + +import ( + "sync" + + "github.com/CanonicalLtd/raft-test/internal/event" + "github.com/hashicorp/raft" +) + +// Schedule contains details about under when a certain event should occur. +type schedule struct { + // List of peers that the event should occurr on. + peers []raft.ServerID + + // The event should fire when the transport tries to append n'th + // command log command in this term. + n uint64 + + // Event object that should be fired when all peers have been trying to + // append the given command. + event *event.Event + + // Track peers where the event already occurred. + occurred []bool + + // If true, the event should occur after the command log has been + // appended to all followers. + append bool + + // Serialize access to internal state. + mu sync.RWMutex +} + +// Return a zero value fault that will never occurr. +func newSchedule() *schedule { + return &schedule{} +} + +// Add a server to the list of peers where the event should occurr. +func (s *schedule) AddPeer(id raft.ServerID) { + s.peers = append(s.peers, id) + s.occurred = append(s.occurred, false) +} + +// Resets this fault to not occur. +func (s *schedule) NoEvent() { + s.n = 0 + s.event = nil + for i := range s.occurred { + s.occurred[i] = false + } + s.append = false +} + +// Configure this scheduler to fire the given event when the append entries RPC to +// apply the n'th command log has failed on all given peers. +func (s *schedule) EnqueueFailure(n uint64, event *event.Event) { + s.n = n + s.event = event + for i := range s.occurred { + s.occurred[i] = false + } +} + +// Configure this scheduler to fire the given event after the n'th command log has +// been appended by all peers but has a failed to be notified to all consumers. +func (s *schedule) AppendFailure(n uint64, event *event.Event) { + s.n = n + s.event = event + for i := range s.occurred { + s.occurred[i] = false + } + s.append = true +} + +// FilterRequest scans the entries in the given append request, to see whether they +// contain the command log that this fault is supposed to trigger upon. +// +// The n parameter is the number of command logs successfully appended so far +// in the current term. +// +// It returns a request object and a boolean value. +// +// If the fault should not be triggered by this request, the returned request +// object is the same as the given one and the boolean value is false. +// +// If the fault should be be triggered by this request, the bolean value will +// be true and for the returned request object the are two cases: +// +// 1) If this is an enqueue fault, the returned request object will have its +// Entries truncated to exclude the failing command log entry and every +// entry beyond that. This way all logs preceeding the failing command log +// will still be appended to the peer and the associated apply futures will +// succeed, although the failing command log won't be applied and its apply +// future will fail with ErrLeadershipLost. +// +// 1) If this is an append fault, the returned request object will be the same +// as the given one. This way all logs willl be appended to the peer, +// although the transport pretend that the append entries RPC has failed, +// simulating a disconnection when delivering the RPC reply. +// +func (s *schedule) FilterRequest(n uint64, args *raft.AppendEntriesRequest) (*raft.AppendEntriesRequest, bool) { + if s.n == 0 { + return args, false + } + + for i, log := range args.Entries { + // Only consider command log entries. + if log.Type != raft.LogCommand { + continue + } + n++ + if n != s.n { + continue + } + + // We found a match. + if !s.append { + truncatedArgs := *args + truncatedArgs.Entries = args.Entries[:i] + args = &truncatedArgs + } + return args, true + + } + return args, false +} + +// Return the command log sequence number that should trigger this fault. +// +// For example if the fault was set to fail at the n'th command log appended +// during the term, the n is returned. +func (s *schedule) Command() uint64 { + return s.n +} + +// Return true if this is an enqueue fault. +func (s *schedule) IsEnqueueFault() bool { + return !s.append +} + +// Return true if this is an append fault. +func (s *schedule) IsAppendFault() bool { + return s.append +} + +// Mark the fault as occurred on the given server, and fire the event if it has +// occurred on all servers. +func (s *schedule) OccurredOn(id raft.ServerID) { + s.mu.Lock() + defer s.mu.Unlock() + for i, other := range s.peers { + if other == id { + s.occurred[i] = true + } + } + + for _, flag := range s.occurred { + if !flag { + return + } + } + s.event.Fire() +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/internal/network/transport.go b/vendor/github.com/CanonicalLtd/raft-test/internal/network/transport.go new file mode 100644 index 0000000000..599b83d06e --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/internal/network/transport.go @@ -0,0 +1,268 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package network + +import ( + "fmt" + "io" + + "github.com/CanonicalLtd/raft-test/internal/event" + "github.com/hashicorp/go-hclog" + "github.com/hashicorp/raft" +) + +// Wrap a regular raft.Transport, adding support for trigger events at +// specific times. +type eventTransport struct { + logger hclog.Logger + + // ID of of the raft server associated with this transport. + id raft.ServerID + + // The regular raft.Transport beging wrapped. + trans raft.Transport + + // Track the peers we are sending RPCs to. + peers *peers + + // Schedule and event that should happen in this transport during a + // term. + schedule *schedule +} + +// Create a new transport wrapper.. +func newEventTransport(logger hclog.Logger, id raft.ServerID, trans raft.Transport) *eventTransport { + return &eventTransport{ + logger: logger, + id: id, + trans: trans, + peers: newPeers(), + schedule: newSchedule(), + } +} + +// Consumer returns a channel that can be used to +// consume and respond to RPC requests. +func (t *eventTransport) Consumer() <-chan raft.RPC { + return t.trans.Consumer() +} + +// LocalAddr is used to return our local address to distinguish from our peers. +func (t *eventTransport) LocalAddr() raft.ServerAddress { + return t.trans.LocalAddr() +} + +// AppendEntriesPipeline returns an interface that can be used to pipeline +// AppendEntries requests. +func (t *eventTransport) AppendEntriesPipeline( + id raft.ServerID, target raft.ServerAddress) (raft.AppendPipeline, error) { + + if t.peers.DisconnectedAndNotSyncing(id) { + t.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: transport: append to %s: not connected", t.id, id)) + return nil, fmt.Errorf("cannot reach server %s", id) + } + if !t.peers.Get(id).Connected() { + t.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: transport: append to %s: syncing logs", t.id, id)) + } + + pipeline, err := t.trans.AppendEntriesPipeline(id, target) + if err != nil { + return nil, err + } + + pipeline = &eventPipeline{ + logger: t.logger, + source: t.id, + target: id, + pipeline: pipeline, + peers: t.peers, + schedule: t.schedule, + shutdownCh: make(chan struct{}), + } + + return pipeline, nil +} + +// AppendEntries sends the appropriate RPC to the target node. +func (t *eventTransport) AppendEntries( + id raft.ServerID, target raft.ServerAddress, args *raft.AppendEntriesRequest, + resp *raft.AppendEntriesResponse) error { + + peer := t.peers.Get(id) + t.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: transport: append to %s: %s", t.id, id, stringifyLogs(args.Entries))) + + // If a fault is set, check if this batch of entries contains a command + // log matching the one configured in the fault. + faulty := false + if t.schedule != nil { + n := peer.CommandLogsCount() + args, faulty = t.schedule.FilterRequest(n, args) + if faulty && t.schedule.IsEnqueueFault() { + t.logger.Debug(fmt.Sprintf( + "[DEBUG] raft-test: server %s: transport: append to %s: enqueue fault: command %d", t.id, id, t.schedule.Command())) + } + } + + if t.peers.DisconnectedAndNotSyncing(id) { + t.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: transport: append to %s: not connected", t.id, id)) + return fmt.Errorf("cannot reach server %s", id) + } + if !t.peers.Get(id).Connected() { + t.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: transport: append to %s: syncing logs", t.id, id)) + } + + if err := t.trans.AppendEntries(id, target, args, resp); err != nil { + return err + } + + // Check for a newer term, stop running + if resp.Term > args.Term { + t.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: server %s: transport: append to %s: newer term", t.id, id)) + } + + peer.UpdateLogs(args.Entries) + + if faulty && t.schedule.IsEnqueueFault() { + t.schedule.OccurredOn(id) + t.schedule.event.Block() + return fmt.Errorf("cannot reach server %s", id) + } + + return nil +} + +// RequestVote sends the appropriate RPC to the target node. +func (t *eventTransport) RequestVote( + id raft.ServerID, target raft.ServerAddress, args *raft.RequestVoteRequest, + resp *raft.RequestVoteResponse) error { + + if !t.peers.Get(id).Connected() { + return fmt.Errorf("connectivity to server %s is down", id) + } + + return t.trans.RequestVote(id, target, args, resp) +} + +// InstallSnapshot is used to push a snapshot down to a follower. The data is read from +// the ReadCloser and streamed to the client. +func (t *eventTransport) InstallSnapshot( + id raft.ServerID, target raft.ServerAddress, args *raft.InstallSnapshotRequest, + resp *raft.InstallSnapshotResponse, data io.Reader) error { + + if !t.peers.Get(id).Connected() { + return fmt.Errorf("connectivity to server %s is down", id) + } + return t.trans.InstallSnapshot(id, target, args, resp, data) +} + +// EncodePeer is used to serialize a peer's address. +func (t *eventTransport) EncodePeer(id raft.ServerID, addr raft.ServerAddress) []byte { + return t.trans.EncodePeer(id, addr) +} + +// DecodePeer is used to deserialize a peer's address. +func (t *eventTransport) DecodePeer(data []byte) raft.ServerAddress { + return t.trans.DecodePeer(data) +} + +// SetHeartbeatHandler is used to setup a heartbeat handler +// as a fast-pass. This is to avoid head-of-line blocking from +// disk IO. If a Transport does not support this, it can simply +// ignore the call, and push the heartbeat onto the Consumer channel. +func (t *eventTransport) SetHeartbeatHandler(cb func(rpc raft.RPC)) { + t.trans.SetHeartbeatHandler(cb) +} + +func (t *eventTransport) Close() error { + if closer, ok := t.trans.(raft.WithClose); ok { + return closer.Close() + } + return nil +} + +// AddPeer adds a new transport as peer of this transport. Once the other +// transport has become a peer, this transport will be able to send RPCs to it, +// if the peer object 'connected' flag is on. +func (t *eventTransport) AddPeer(transport *eventTransport) { + t.peers.Add(t.id, transport.id) + t.schedule.AddPeer(transport.id) +} + +// Electing resets any leader-related state in this transport (such as the +// track of logs appended by the peers), and it connects the transport to all +// its peers, enabling it to send them RPCs. It must be called whenever the +// server associated with this transport is about to transition to the leader +// state, and before any append entries RPC is made. +func (t *eventTransport) Electing() { + t.schedule.NoEvent() + for _, peer := range t.peers.All() { + peer.ResetLogs() + } + t.peers.Connect() +} + +// Deposing disables connectivity from this transport to all its peers, +// allowing only append entries RPCs for peers that are lagging behind in terms +// of applied logs to be performed. +func (t *eventTransport) Deposing() { + t.peers.SoftDisconnect() +} + +// Disable connectivity from this transport to the given peer. +func (t *eventTransport) Disconnect(id raft.ServerID) { + t.peers.Get(id).Disconnect() +} + +// Re-nable connectivity from this transport to the given peer. +func (t *eventTransport) Reconnect(id raft.ServerID) { + t.peers.Get(id).Reconnect() +} + +// Returns true if all peers are connected, false otherwise. +// +// It panics if some nodes are connected and others are not. +func (t *eventTransport) Connected() bool { + return t.peers.Connected() +} + +// Returns true if the given peer is connected. +func (t *eventTransport) PeerConnected(id raft.ServerID) bool { + return t.peers.Get(id).Connected() +} + +// Returns true if this transport has appended logs to the given peer during +// the term. +func (t *eventTransport) HasAppendedLogsTo(id raft.ServerID) bool { + peer := t.peers.Get(id) + return peer.LogsCount() > 0 +} + +// Schedule the n'th command log to fail to be appended to the +// followers. Return an event that will fire when all followers have reached +// this failure. +func (t *eventTransport) ScheduleEnqueueFailure(n uint64) *event.Event { + event := event.New() + t.schedule.EnqueueFailure(n, event) + return event +} + +// Schedule the n'th command log to fail to acknowledge that it has been +// appended to the followers. Return an event that will fire when all followers +// have reached this failure. +func (t *eventTransport) ScheduleAppendFailure(n uint64) *event.Event { + event := event.New() + t.schedule.AppendFailure(n, event) + return event +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/options.go b/vendor/github.com/CanonicalLtd/raft-test/options.go new file mode 100644 index 0000000000..77e7feef98 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/options.go @@ -0,0 +1,107 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package rafttest + +import ( + "io/ioutil" + "time" + + "github.com/hashicorp/go-hclog" + "github.com/hashicorp/raft" +) + +// Config sets a hook for tweaking the raft configuration of individual nodes. +func Config(f func(int, *raft.Config)) Option { + return func(nodes []*dependencies) { + for i, node := range nodes { + f(i, node.Conf) + } + } +} + +// LogStore can be used to create custom log stores. +// +// The given function takes a node index as argument and returns the LogStore +// that the node should use. +func LogStore(factory func(int) raft.LogStore) Option { + return func(nodes []*dependencies) { + for i, node := range nodes { + node.Logs = factory(i) + } + } +} + +// Transport can be used to create custom transports. +// +// The given function takes a node index as argument and returns the Transport +// that the node should use. +// +// If the transports returned by the factory do not implement +// LoopbackTransport, the Disconnect API won't work. +func Transport(factory func(int) raft.Transport) Option { + return func(nodes []*dependencies) { + for i, node := range nodes { + node.Trans = factory(i) + } + } +} + +// Latency is a convenience around Config that scales the values of the various +// raft timeouts that would be set by default by Cluster. +// +// This option is orthogonal to the GO_RAFT_TEST_LATENCY environment +// variable. If this option is used and GO_RAFT_TEST_LATENCY is set, they will +// compound. E.g. passing a factor of 2.0 to this option and setting +// GO_RAFT_TEST_LATENCY to 3.0 will have the net effect that default timeouts +// are scaled by a factor of 6.0. +func Latency(factor float64) Option { + return Config(func(i int, config *raft.Config) { + timeouts := []*time.Duration{ + &config.HeartbeatTimeout, + &config.ElectionTimeout, + &config.LeaderLeaseTimeout, + &config.CommitTimeout, + } + for _, timeout := range timeouts { + *timeout = scaleDuration(*timeout, factor) + } + }) +} + +// DiscardLogger is a convenience around Config that sets the output stream of +// raft's logger to ioutil.Discard. +func DiscardLogger() Option { + return Config(func(i int, config *raft.Config) { + config.Logger = hclog.New(&hclog.LoggerOptions{ + Name: "raft-test", + Output: ioutil.Discard}) + }) +} + +// Servers can be used to indicate which nodes should be initially part of the +// created cluster. +// +// If this option is not used, the default is to have all nodes be part of the +// cluster. +func Servers(indexes ...int) Option { + return func(nodes []*dependencies) { + for _, node := range nodes { + node.Voter = false + } + for _, index := range indexes { + nodes[index].Voter = true + } + } +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/server.go b/vendor/github.com/CanonicalLtd/raft-test/server.go new file mode 100644 index 0000000000..035774cdb0 --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/server.go @@ -0,0 +1,36 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package rafttest + +import ( + "testing" + + "github.com/hashicorp/raft" +) + +// Server is a convenience for creating a cluster with a single raft.Raft server +// that immediately be elected as leader. +// +// The default network address of a test node is "0". +// +// Dependencies can be replaced or mutated using the various options. +func Server(t *testing.T, fsm raft.FSM, options ...Option) (*raft.Raft, func()) { + fsms := []raft.FSM{fsm} + + rafts, control := Cluster(t, fsms, options...) + control.Elect("0") + + return rafts["0"], control.Close +} diff --git a/vendor/github.com/CanonicalLtd/raft-test/term.go b/vendor/github.com/CanonicalLtd/raft-test/term.go new file mode 100644 index 0000000000..c60957e87f --- /dev/null +++ b/vendor/github.com/CanonicalLtd/raft-test/term.go @@ -0,0 +1,219 @@ +// Copyright 2017 Canonical Ltd. +// +// Licensed under the Apache License, Version 2.0 (the "License"); +// you may not use this file except in compliance with the License. +// You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package rafttest + +import ( + "fmt" + + "github.com/CanonicalLtd/raft-test/internal/election" + "github.com/CanonicalLtd/raft-test/internal/event" + "github.com/hashicorp/raft" +) + +// A Term holds information about an event that should happen while a certain +// node is the leader. +type Term struct { + control *Control + id raft.ServerID + leadership *election.Leadership + events []*Event + + // Server ID of a follower that has been disconnect. + disconnected raft.ServerID +} + +// When can be used to schedule a certain action when a certain expected +// event occurs in the cluster during this Term. +func (t *Term) When() *Event { + // TODO: check that we're not using Connect() + t.control.t.Helper() + + event := &Event{ + term: t, + } + + t.events = append(t.events, event) + return event +} + +// Disconnect a follower, which will stop receiving RPCs. +func (t *Term) Disconnect(id raft.ServerID) { + t.control.t.Helper() + + if t.disconnected != "" { + t.control.t.Fatalf("raft-test: term: disconnecting more than one server is not supported") + } + + if id == t.id { + t.control.t.Fatalf("raft-test: term: disconnect error: server %s is the leader", t.id) + } + + t.control.logger.Debug(fmt.Sprintf("[DEBUG] raft-test: term: disconnect %s", id)) + + t.disconnected = id + t.control.network.Disconnect(t.id, id) +} + +// Reconnect a previously disconnected follower. +func (t *Term) Reconnect(id raft.ServerID) { + t.control.t.Helper() + + if id != t.disconnected { + t.control.t.Fatalf("raft-test: term: reconnect error: server %s was not disconnected", id) + } + + // Reconnecting a server might end up in a new election round, so we + // have to be prepared for that. + t.control.network.Reconnect(t.id, id) + if t.control.waitLeadershipPropagated(t.id, t.leadership) { + // Leadership was not lost and all followers are back + // on track. + return + } + + // Leadership was lost, we must undergo a new election. + // + // FIXME: this prevents When() hooks to function properly. It's not a + // big deal at the moment, since Disconnect() is mainly used for + // snapshots, but it should be sorted. + term := t.control.Elect(t.id) + t.leadership = term.leadership +} + +// Snapshot performs a snapshot on the given server. +func (t *Term) Snapshot(id raft.ServerID) { + t.control.t.Helper() + + r := t.control.servers[id] + if err := r.Snapshot().Error(); err != nil { + t.control.t.Fatalf("raft-test: term: snapshot error on server %s: %v", id, err) + } +} + +// Event that is expected to happen during a Term. +type Event struct { + term *Term + isScheduled bool +} + +// Command schedules the event to occur when the Raft.Apply() method is called +// on the leader raft instance in order to apply the n'th command log during +// the current term. +func (e *Event) Command(n uint64) *Dispatch { + e.term.control.t.Helper() + + if e.isScheduled { + e.term.control.t.Fatal("raft-test: error: term event already scheduled") + } + e.isScheduled = true + + return &Dispatch{ + term: e.term, + n: n, + } +} + +// Dispatch defines at which phase of the dispatch process a command log event +// should fire. +type Dispatch struct { + term *Term + n uint64 + event *event.Event +} + +// Enqueued configures the command log event to occurr when the command log is +// enqueued, but not yet appended by the followers. +func (d *Dispatch) Enqueued() *Action { + d.term.control.t.Helper() + + if d.event != nil { + d.term.control.t.Fatal("raft-test: error: dispatch event already defined") + } + d.event = d.term.control.whenCommandEnqueued(d.term.id, d.n) + + return &Action{ + term: d.term, + event: d.event, + } +} + +// Appended configures the command log event to occurr when the command log is +// appended by all followers, but not yet committed by the leader. +func (d *Dispatch) Appended() *Action { + d.term.control.t.Helper() + + if d.event != nil { + d.term.control.t.Fatal("raft-test: error: dispatch event already defined") + } + + d.event = d.term.control.whenCommandAppended(d.term.id, d.n) + + return &Action{ + term: d.term, + event: d.event, + } +} + +// Committed configures the command log event to occurr when the command log is +// committed. +func (d *Dispatch) Committed() *Action { + d.term.control.t.Helper() + + if d.event != nil { + d.term.control.t.Fatal("raft-test: error: dispatch event already defined") + } + + d.event = d.term.control.whenCommandCommitted(d.term.id, d.n) + + return &Action{ + term: d.term, + event: d.event, + } +} + +// Action defines what should happen when the event defined in the term occurs. +type Action struct { + term *Term + event *event.Event +} + +// Depose makes the action depose the current leader. +func (a *Action) Depose() { + a.term.control.t.Helper() + //a.control.t.Logf( + //"raft-test: event: schedule depose server %s when command %d gets %s", a.id, a.n, a.phase) + + a.term.control.deposing = make(chan struct{}) + + go func() { + //c.t.Logf("raft-test: node %d: fsm: wait log command %d", i, n) + a.term.control.deposeUponEvent(a.event, a.term.id, a.term.leadership) + }() +} + +// Snapshot makes the action trigger a snapshot on the leader. +// +// The typical use is to take the snapshot after a certain command log gets +// committed (see Dispatch.Committed()). +func (a *Action) Snapshot() { + a.term.control.t.Helper() + // a.control.t.Logf( + // "raft-test: event: schedule snapshot server %s when command %d gets %s", a.id, a.n, a.phase) + + go func() { + //c.t.Logf("raft-test: node %d: fsm: wait log command %d", i, n) + a.term.control.snapshotUponEvent(a.event, a.term.id) + }() +} diff --git a/vendor/github.com/hashicorp/raft-boltdb/LICENSE b/vendor/github.com/hashicorp/raft-boltdb/LICENSE new file mode 100644 index 0000000000..f0e5c79e18 --- /dev/null +++ b/vendor/github.com/hashicorp/raft-boltdb/LICENSE @@ -0,0 +1,362 @@ +Mozilla Public License, version 2.0 + +1. Definitions + +1.1. "Contributor" + + means each individual or legal entity that creates, contributes to the + creation of, or owns Covered Software. + +1.2. "Contributor Version" + + means the combination of the Contributions of others (if any) used by a + Contributor and that particular Contributor's Contribution. + +1.3. "Contribution" + + means Covered Software of a particular Contributor. + +1.4. "Covered Software" + + means Source Code Form to which the initial Contributor has attached the + notice in Exhibit A, the Executable Form of such Source Code Form, and + Modifications of such Source Code Form, in each case including portions + thereof. + +1.5. "Incompatible With Secondary Licenses" + means + + a. that the initial Contributor has attached the notice described in + Exhibit B to the Covered Software; or + + b. that the Covered Software was made available under the terms of + version 1.1 or earlier of the License, but not also under the terms of + a Secondary License. + +1.6. "Executable Form" + + means any form of the work other than Source Code Form. + +1.7. "Larger Work" + + means a work that combines Covered Software with other material, in a + separate file or files, that is not Covered Software. + +1.8. "License" + + means this document. + +1.9. "Licensable" + + means having the right to grant, to the maximum extent possible, whether + at the time of the initial grant or subsequently, any and all of the + rights conveyed by this License. + +1.10. "Modifications" + + means any of the following: + + a. any file in Source Code Form that results from an addition to, + deletion from, or modification of the contents of Covered Software; or + + b. any new file in Source Code Form that contains any Covered Software. + +1.11. "Patent Claims" of a Contributor + + means any patent claim(s), including without limitation, method, + process, and apparatus claims, in any patent Licensable by such + Contributor that would be infringed, but for the grant of the License, + by the making, using, selling, offering for sale, having made, import, + or transfer of either its Contributions or its Contributor Version. + +1.12. "Secondary License" + + means either the GNU General Public License, Version 2.0, the GNU Lesser + General Public License, Version 2.1, the GNU Affero General Public + License, Version 3.0, or any later versions of those licenses. + +1.13. "Source Code Form" + + means the form of the work preferred for making modifications. + +1.14. "You" (or "Your") + + means an individual or a legal entity exercising rights under this + License. For legal entities, "You" includes any entity that controls, is + controlled by, or is under common control with You. For purposes of this + definition, "control" means (a) the power, direct or indirect, to cause + the direction or management of such entity, whether by contract or + otherwise, or (b) ownership of more than fifty percent (50%) of the + outstanding shares or beneficial ownership of such entity. + + +2. License Grants and Conditions + +2.1. Grants + + Each Contributor hereby grants You a world-wide, royalty-free, + non-exclusive license: + + a. under intellectual property rights (other than patent or trademark) + Licensable by such Contributor to use, reproduce, make available, + modify, display, perform, distribute, and otherwise exploit its + Contributions, either on an unmodified basis, with Modifications, or + as part of a Larger Work; and + + b. under Patent Claims of such Contributor to make, use, sell, offer for + sale, have made, import, and otherwise transfer either its + Contributions or its Contributor Version. + +2.2. Effective Date + + The licenses granted in Section 2.1 with respect to any Contribution + become effective for each Contribution on the date the Contributor first + distributes such Contribution. + +2.3. Limitations on Grant Scope + + The licenses granted in this Section 2 are the only rights granted under + this License. No additional rights or licenses will be implied from the + distribution or licensing of Covered Software under this License. + Notwithstanding Section 2.1(b) above, no patent license is granted by a + Contributor: + + a. for any code that a Contributor has removed from Covered Software; or + + b. for infringements caused by: (i) Your and any other third party's + modifications of Covered Software, or (ii) the combination of its + Contributions with other software (except as part of its Contributor + Version); or + + c. under Patent Claims infringed by Covered Software in the absence of + its Contributions. + + This License does not grant any rights in the trademarks, service marks, + or logos of any Contributor (except as may be necessary to comply with + the notice requirements in Section 3.4). + +2.4. Subsequent Licenses + + No Contributor makes additional grants as a result of Your choice to + distribute the Covered Software under a subsequent version of this + License (see Section 10.2) or under the terms of a Secondary License (if + permitted under the terms of Section 3.3). + +2.5. Representation + + Each Contributor represents that the Contributor believes its + Contributions are its original creation(s) or it has sufficient rights to + grant the rights to its Contributions conveyed by this License. + +2.6. Fair Use + + This License is not intended to limit any rights You have under + applicable copyright doctrines of fair use, fair dealing, or other + equivalents. + +2.7. Conditions + + Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in + Section 2.1. + + +3. Responsibilities + +3.1. Distribution of Source Form + + All distribution of Covered Software in Source Code Form, including any + Modifications that You create or to which You contribute, must be under + the terms of this License. You must inform recipients that the Source + Code Form of the Covered Software is governed by the terms of this + License, and how they can obtain a copy of this License. You may not + attempt to alter or restrict the recipients' rights in the Source Code + Form. + +3.2. Distribution of Executable Form + + If You distribute Covered Software in Executable Form then: + + a. such Covered Software must also be made available in Source Code Form, + as described in Section 3.1, and You must inform recipients of the + Executable Form how they can obtain a copy of such Source Code Form by + reasonable means in a timely manner, at a charge no more than the cost + of distribution to the recipient; and + + b. You may distribute such Executable Form under the terms of this + License, or sublicense it under different terms, provided that the + license for the Executable Form does not attempt to limit or alter the + recipients' rights in the Source Code Form under this License. + +3.3. Distribution of a Larger Work + + You may create and distribute a Larger Work under terms of Your choice, + provided that You also comply with the requirements of this License for + the Covered Software. If the Larger Work is a combination of Covered + Software with a work governed by one or more Secondary Licenses, and the + Covered Software is not Incompatible With Secondary Licenses, this + License permits You to additionally distribute such Covered Software + under the terms of such Secondary License(s), so that the recipient of + the Larger Work may, at their option, further distribute the Covered + Software under the terms of either this License or such Secondary + License(s). + +3.4. Notices + + You may not remove or alter the substance of any license notices + (including copyright notices, patent notices, disclaimers of warranty, or + limitations of liability) contained within the Source Code Form of the + Covered Software, except that You may alter any license notices to the + extent required to remedy known factual inaccuracies. + +3.5. Application of Additional Terms + + You may choose to offer, and to charge a fee for, warranty, support, + indemnity or liability obligations to one or more recipients of Covered + Software. However, You may do so only on Your own behalf, and not on + behalf of any Contributor. You must make it absolutely clear that any + such warranty, support, indemnity, or liability obligation is offered by + You alone, and You hereby agree to indemnify every Contributor for any + liability incurred by such Contributor as a result of warranty, support, + indemnity or liability terms You offer. You may include additional + disclaimers of warranty and limitations of liability specific to any + jurisdiction. + +4. Inability to Comply Due to Statute or Regulation + + If it is impossible for You to comply with any of the terms of this License + with respect to some or all of the Covered Software due to statute, + judicial order, or regulation then You must: (a) comply with the terms of + this License to the maximum extent possible; and (b) describe the + limitations and the code they affect. Such description must be placed in a + text file included with all distributions of the Covered Software under + this License. Except to the extent prohibited by statute or regulation, + such description must be sufficiently detailed for a recipient of ordinary + skill to be able to understand it. + +5. Termination + +5.1. The rights granted under this License will terminate automatically if You + fail to comply with any of its terms. However, if You become compliant, + then the rights granted under this License from a particular Contributor + are reinstated (a) provisionally, unless and until such Contributor + explicitly and finally terminates Your grants, and (b) on an ongoing + basis, if such Contributor fails to notify You of the non-compliance by + some reasonable means prior to 60 days after You have come back into + compliance. Moreover, Your grants from a particular Contributor are + reinstated on an ongoing basis if such Contributor notifies You of the + non-compliance by some reasonable means, this is the first time You have + received notice of non-compliance with this License from such + Contributor, and You become compliant prior to 30 days after Your receipt + of the notice. + +5.2. If You initiate litigation against any entity by asserting a patent + infringement claim (excluding declaratory judgment actions, + counter-claims, and cross-claims) alleging that a Contributor Version + directly or indirectly infringes any patent, then the rights granted to + You by any and all Contributors for the Covered Software under Section + 2.1 of this License shall terminate. + +5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user + license agreements (excluding distributors and resellers) which have been + validly granted by You or Your distributors under this License prior to + termination shall survive termination. + +6. Disclaimer of Warranty + + Covered Software is provided under this License on an "as is" basis, + without warranty of any kind, either expressed, implied, or statutory, + including, without limitation, warranties that the Covered Software is free + of defects, merchantable, fit for a particular purpose or non-infringing. + The entire risk as to the quality and performance of the Covered Software + is with You. Should any Covered Software prove defective in any respect, + You (not any Contributor) assume the cost of any necessary servicing, + repair, or correction. This disclaimer of warranty constitutes an essential + part of this License. No use of any Covered Software is authorized under + this License except under this disclaimer. + +7. Limitation of Liability + + Under no circumstances and under no legal theory, whether tort (including + negligence), contract, or otherwise, shall any Contributor, or anyone who + distributes Covered Software as permitted above, be liable to You for any + direct, indirect, special, incidental, or consequential damages of any + character including, without limitation, damages for lost profits, loss of + goodwill, work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses, even if such party shall have been + informed of the possibility of such damages. This limitation of liability + shall not apply to liability for death or personal injury resulting from + such party's negligence to the extent applicable law prohibits such + limitation. Some jurisdictions do not allow the exclusion or limitation of + incidental or consequential damages, so this exclusion and limitation may + not apply to You. + +8. Litigation + + Any litigation relating to this License may be brought only in the courts + of a jurisdiction where the defendant maintains its principal place of + business and such litigation shall be governed by laws of that + jurisdiction, without reference to its conflict-of-law provisions. Nothing + in this Section shall prevent a party's ability to bring cross-claims or + counter-claims. + +9. Miscellaneous + + This License represents the complete agreement concerning the subject + matter hereof. If any provision of this License is held to be + unenforceable, such provision shall be reformed only to the extent + necessary to make it enforceable. Any law or regulation which provides that + the language of a contract shall be construed against the drafter shall not + be used to construe this License against a Contributor. + + +10. Versions of the License + +10.1. New Versions + + Mozilla Foundation is the license steward. Except as provided in Section + 10.3, no one other than the license steward has the right to modify or + publish new versions of this License. Each version will be given a + distinguishing version number. + +10.2. Effect of New Versions + + You may distribute the Covered Software under the terms of the version + of the License under which You originally received the Covered Software, + or under the terms of any subsequent version published by the license + steward. + +10.3. Modified Versions + + If you create software not governed by this License, and you want to + create a new license for such software, you may create and use a + modified version of this License if you rename the license and remove + any references to the name of the license steward (except to note that + such modified license differs from this License). + +10.4. Distributing Source Code Form that is Incompatible With Secondary + Licenses If You choose to distribute Source Code Form that is + Incompatible With Secondary Licenses under the terms of this version of + the License, the notice described in Exhibit B of this License must be + attached. + +Exhibit A - Source Code Form License Notice + + This Source Code Form is subject to the + terms of the Mozilla Public License, v. + 2.0. If a copy of the MPL was not + distributed with this file, You can + obtain one at + http://mozilla.org/MPL/2.0/. + +If it is not possible or desirable to put the notice in a particular file, +then You may include the notice in a location (such as a LICENSE file in a +relevant directory) where a recipient would be likely to look for such a +notice. + +You may add additional accurate notices of copyright ownership. + +Exhibit B - "Incompatible With Secondary Licenses" Notice + + This Source Code Form is "Incompatible + With Secondary Licenses", as defined by + the Mozilla Public License, v. 2.0. \ No newline at end of file diff --git a/vendor/github.com/hashicorp/raft-boltdb/Makefile b/vendor/github.com/hashicorp/raft-boltdb/Makefile new file mode 100644 index 0000000000..bc5c6cc011 --- /dev/null +++ b/vendor/github.com/hashicorp/raft-boltdb/Makefile @@ -0,0 +1,11 @@ +DEPS = $(go list -f '{{range .TestImports}}{{.}} {{end}}' ./...) + +.PHONY: test deps + +test: + go test -timeout=30s ./... + +deps: + go get -d -v ./... + echo $(DEPS) | xargs -n1 go get -d + diff --git a/vendor/github.com/hashicorp/raft-boltdb/README.md b/vendor/github.com/hashicorp/raft-boltdb/README.md new file mode 100644 index 0000000000..5d7180ab9e --- /dev/null +++ b/vendor/github.com/hashicorp/raft-boltdb/README.md @@ -0,0 +1,11 @@ +raft-boltdb +=========== + +This repository provides the `raftboltdb` package. The package exports the +`BoltStore` which is an implementation of both a `LogStore` and `StableStore`. + +It is meant to be used as a backend for the `raft` [package +here](https://github.com/hashicorp/raft). + +This implementation uses [BoltDB](https://github.com/boltdb/bolt). BoltDB is +a simple key/value store implemented in pure Go, and inspired by LMDB. diff --git a/vendor/github.com/hashicorp/raft-boltdb/bolt_store.go b/vendor/github.com/hashicorp/raft-boltdb/bolt_store.go new file mode 100644 index 0000000000..a1f9f0ba61 --- /dev/null +++ b/vendor/github.com/hashicorp/raft-boltdb/bolt_store.go @@ -0,0 +1,268 @@ +package raftboltdb + +import ( + "errors" + + "github.com/boltdb/bolt" + "github.com/hashicorp/raft" +) + +const ( + // Permissions to use on the db file. This is only used if the + // database file does not exist and needs to be created. + dbFileMode = 0600 +) + +var ( + // Bucket names we perform transactions in + dbLogs = []byte("logs") + dbConf = []byte("conf") + + // An error indicating a given key does not exist + ErrKeyNotFound = errors.New("not found") +) + +// BoltStore provides access to BoltDB for Raft to store and retrieve +// log entries. It also provides key/value storage, and can be used as +// a LogStore and StableStore. +type BoltStore struct { + // conn is the underlying handle to the db. + conn *bolt.DB + + // The path to the Bolt database file + path string +} + +// Options contains all the configuraiton used to open the BoltDB +type Options struct { + // Path is the file path to the BoltDB to use + Path string + + // BoltOptions contains any specific BoltDB options you might + // want to specify [e.g. open timeout] + BoltOptions *bolt.Options + + // NoSync causes the database to skip fsync calls after each + // write to the log. This is unsafe, so it should be used + // with caution. + NoSync bool +} + +// readOnly returns true if the contained bolt options say to open +// the DB in readOnly mode [this can be useful to tools that want +// to examine the log] +func (o *Options) readOnly() bool { + return o != nil && o.BoltOptions != nil && o.BoltOptions.ReadOnly +} + +// NewBoltStore takes a file path and returns a connected Raft backend. +func NewBoltStore(path string) (*BoltStore, error) { + return New(Options{Path: path}) +} + +// New uses the supplied options to open the BoltDB and prepare it for use as a raft backend. +func New(options Options) (*BoltStore, error) { + // Try to connect + handle, err := bolt.Open(options.Path, dbFileMode, options.BoltOptions) + if err != nil { + return nil, err + } + handle.NoSync = options.NoSync + + // Create the new store + store := &BoltStore{ + conn: handle, + path: options.Path, + } + + // If the store was opened read-only, don't try and create buckets + if !options.readOnly() { + // Set up our buckets + if err := store.initialize(); err != nil { + store.Close() + return nil, err + } + } + return store, nil +} + +// initialize is used to set up all of the buckets. +func (b *BoltStore) initialize() error { + tx, err := b.conn.Begin(true) + if err != nil { + return err + } + defer tx.Rollback() + + // Create all the buckets + if _, err := tx.CreateBucketIfNotExists(dbLogs); err != nil { + return err + } + if _, err := tx.CreateBucketIfNotExists(dbConf); err != nil { + return err + } + + return tx.Commit() +} + +// Close is used to gracefully close the DB connection. +func (b *BoltStore) Close() error { + return b.conn.Close() +} + +// FirstIndex returns the first known index from the Raft log. +func (b *BoltStore) FirstIndex() (uint64, error) { + tx, err := b.conn.Begin(false) + if err != nil { + return 0, err + } + defer tx.Rollback() + + curs := tx.Bucket(dbLogs).Cursor() + if first, _ := curs.First(); first == nil { + return 0, nil + } else { + return bytesToUint64(first), nil + } +} + +// LastIndex returns the last known index from the Raft log. +func (b *BoltStore) LastIndex() (uint64, error) { + tx, err := b.conn.Begin(false) + if err != nil { + return 0, err + } + defer tx.Rollback() + + curs := tx.Bucket(dbLogs).Cursor() + if last, _ := curs.Last(); last == nil { + return 0, nil + } else { + return bytesToUint64(last), nil + } +} + +// GetLog is used to retrieve a log from BoltDB at a given index. +func (b *BoltStore) GetLog(idx uint64, log *raft.Log) error { + tx, err := b.conn.Begin(false) + if err != nil { + return err + } + defer tx.Rollback() + + bucket := tx.Bucket(dbLogs) + val := bucket.Get(uint64ToBytes(idx)) + + if val == nil { + return raft.ErrLogNotFound + } + return decodeMsgPack(val, log) +} + +// StoreLog is used to store a single raft log +func (b *BoltStore) StoreLog(log *raft.Log) error { + return b.StoreLogs([]*raft.Log{log}) +} + +// StoreLogs is used to store a set of raft logs +func (b *BoltStore) StoreLogs(logs []*raft.Log) error { + tx, err := b.conn.Begin(true) + if err != nil { + return err + } + defer tx.Rollback() + + for _, log := range logs { + key := uint64ToBytes(log.Index) + val, err := encodeMsgPack(log) + if err != nil { + return err + } + bucket := tx.Bucket(dbLogs) + if err := bucket.Put(key, val.Bytes()); err != nil { + return err + } + } + + return tx.Commit() +} + +// DeleteRange is used to delete logs within a given range inclusively. +func (b *BoltStore) DeleteRange(min, max uint64) error { + minKey := uint64ToBytes(min) + + tx, err := b.conn.Begin(true) + if err != nil { + return err + } + defer tx.Rollback() + + curs := tx.Bucket(dbLogs).Cursor() + for k, _ := curs.Seek(minKey); k != nil; k, _ = curs.Next() { + // Handle out-of-range log index + if bytesToUint64(k) > max { + break + } + + // Delete in-range log index + if err := curs.Delete(); err != nil { + return err + } + } + + return tx.Commit() +} + +// Set is used to set a key/value set outside of the raft log +func (b *BoltStore) Set(k, v []byte) error { + tx, err := b.conn.Begin(true) + if err != nil { + return err + } + defer tx.Rollback() + + bucket := tx.Bucket(dbConf) + if err := bucket.Put(k, v); err != nil { + return err + } + + return tx.Commit() +} + +// Get is used to retrieve a value from the k/v store by key +func (b *BoltStore) Get(k []byte) ([]byte, error) { + tx, err := b.conn.Begin(false) + if err != nil { + return nil, err + } + defer tx.Rollback() + + bucket := tx.Bucket(dbConf) + val := bucket.Get(k) + + if val == nil { + return nil, ErrKeyNotFound + } + return append([]byte(nil), val...), nil +} + +// SetUint64 is like Set, but handles uint64 values +func (b *BoltStore) SetUint64(key []byte, val uint64) error { + return b.Set(key, uint64ToBytes(val)) +} + +// GetUint64 is like Get, but handles uint64 values +func (b *BoltStore) GetUint64(key []byte) (uint64, error) { + val, err := b.Get(key) + if err != nil { + return 0, err + } + return bytesToUint64(val), nil +} + +// Sync performs an fsync on the database file handle. This is not necessary +// under normal operation unless NoSync is enabled, in which this forces the +// database file to sync against the disk. +func (b *BoltStore) Sync() error { + return b.conn.Sync() +} diff --git a/vendor/github.com/hashicorp/raft-boltdb/util.go b/vendor/github.com/hashicorp/raft-boltdb/util.go new file mode 100644 index 0000000000..68dd786b7a --- /dev/null +++ b/vendor/github.com/hashicorp/raft-boltdb/util.go @@ -0,0 +1,37 @@ +package raftboltdb + +import ( + "bytes" + "encoding/binary" + + "github.com/hashicorp/go-msgpack/codec" +) + +// Decode reverses the encode operation on a byte slice input +func decodeMsgPack(buf []byte, out interface{}) error { + r := bytes.NewBuffer(buf) + hd := codec.MsgpackHandle{} + dec := codec.NewDecoder(r, &hd) + return dec.Decode(out) +} + +// Encode writes an encoded object to a new bytes buffer +func encodeMsgPack(in interface{}) (*bytes.Buffer, error) { + buf := bytes.NewBuffer(nil) + hd := codec.MsgpackHandle{} + enc := codec.NewEncoder(buf, &hd) + err := enc.Encode(in) + return buf, err +} + +// Converts bytes to an integer +func bytesToUint64(b []byte) uint64 { + return binary.BigEndian.Uint64(b) +} + +// Converts a uint to a byte slice +func uint64ToBytes(u uint64) []byte { + buf := make([]byte, 8) + binary.BigEndian.PutUint64(buf, u) + return buf +} diff --git a/vendor/github.com/hashicorp/raft/CHANGELOG.md b/vendor/github.com/hashicorp/raft/CHANGELOG.md new file mode 100644 index 0000000000..06f1d8a6c5 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/CHANGELOG.md @@ -0,0 +1,16 @@ +# 1.0.1 (April 12th, 2019) + +IMPROVEMENTS + +* InMemTransport: Add timeout for sending a message [[GH-313](https://github.com/hashicorp/raft/pull/313)] +* ensure 'make deps' downloads test dependencies like testify [[GH-310](https://github.com/hashicorp/raft/pull/310)] +* Clarifies function of CommitTimeout [[GH-309](https://github.com/hashicorp/raft/pull/309)] +* Add additional metrics regarding log dispatching and committal [[GH-316](https://github.com/hashicorp/raft/pull/316)] + +# 1.0.0 (October 3rd, 2017) + +v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version manages server identities using a UUID, so introduces some breaking API changes. It also versions the Raft protocol, and requires some special steps when interoperating with Raft servers running older versions of the library (see the detailed comment in config.go about version compatibility). You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required to port Consul to these new interfaces. + +# 0.1.0 (September 29th, 2017) + +v0.1.0 is the original stable version of the library that was in master and has been maintained with no breaking API changes. This was in use by Consul prior to version 0.7.0. diff --git a/vendor/github.com/hashicorp/raft/LICENSE b/vendor/github.com/hashicorp/raft/LICENSE new file mode 100644 index 0000000000..c33dcc7c92 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/LICENSE @@ -0,0 +1,354 @@ +Mozilla Public License, version 2.0 + +1. Definitions + +1.1. “Contributor” + + means each individual or legal entity that creates, contributes to the + creation of, or owns Covered Software. + +1.2. “Contributor Version” + + means the combination of the Contributions of others (if any) used by a + Contributor and that particular Contributor’s Contribution. + +1.3. “Contribution” + + means Covered Software of a particular Contributor. + +1.4. “Covered Software” + + means Source Code Form to which the initial Contributor has attached the + notice in Exhibit A, the Executable Form of such Source Code Form, and + Modifications of such Source Code Form, in each case including portions + thereof. + +1.5. “Incompatible With Secondary Licenses” + means + + a. that the initial Contributor has attached the notice described in + Exhibit B to the Covered Software; or + + b. that the Covered Software was made available under the terms of version + 1.1 or earlier of the License, but not also under the terms of a + Secondary License. + +1.6. “Executable Form” + + means any form of the work other than Source Code Form. + +1.7. “Larger Work” + + means a work that combines Covered Software with other material, in a separate + file or files, that is not Covered Software. + +1.8. “License” + + means this document. + +1.9. “Licensable” + + means having the right to grant, to the maximum extent possible, whether at the + time of the initial grant or subsequently, any and all of the rights conveyed by + this License. + +1.10. “Modifications” + + means any of the following: + + a. any file in Source Code Form that results from an addition to, deletion + from, or modification of the contents of Covered Software; or + + b. any new file in Source Code Form that contains any Covered Software. + +1.11. “Patent Claims” of a Contributor + + means any patent claim(s), including without limitation, method, process, + and apparatus claims, in any patent Licensable by such Contributor that + would be infringed, but for the grant of the License, by the making, + using, selling, offering for sale, having made, import, or transfer of + either its Contributions or its Contributor Version. + +1.12. “Secondary License” + + means either the GNU General Public License, Version 2.0, the GNU Lesser + General Public License, Version 2.1, the GNU Affero General Public + License, Version 3.0, or any later versions of those licenses. + +1.13. “Source Code Form” + + means the form of the work preferred for making modifications. + +1.14. “You” (or “Your”) + + means an individual or a legal entity exercising rights under this + License. For legal entities, “You” includes any entity that controls, is + controlled by, or is under common control with You. For purposes of this + definition, “control” means (a) the power, direct or indirect, to cause + the direction or management of such entity, whether by contract or + otherwise, or (b) ownership of more than fifty percent (50%) of the + outstanding shares or beneficial ownership of such entity. + + +2. License Grants and Conditions + +2.1. Grants + + Each Contributor hereby grants You a world-wide, royalty-free, + non-exclusive license: + + a. under intellectual property rights (other than patent or trademark) + Licensable by such Contributor to use, reproduce, make available, + modify, display, perform, distribute, and otherwise exploit its + Contributions, either on an unmodified basis, with Modifications, or as + part of a Larger Work; and + + b. under Patent Claims of such Contributor to make, use, sell, offer for + sale, have made, import, and otherwise transfer either its Contributions + or its Contributor Version. + +2.2. Effective Date + + The licenses granted in Section 2.1 with respect to any Contribution become + effective for each Contribution on the date the Contributor first distributes + such Contribution. + +2.3. Limitations on Grant Scope + + The licenses granted in this Section 2 are the only rights granted under this + License. No additional rights or licenses will be implied from the distribution + or licensing of Covered Software under this License. Notwithstanding Section + 2.1(b) above, no patent license is granted by a Contributor: + + a. for any code that a Contributor has removed from Covered Software; or + + b. for infringements caused by: (i) Your and any other third party’s + modifications of Covered Software, or (ii) the combination of its + Contributions with other software (except as part of its Contributor + Version); or + + c. under Patent Claims infringed by Covered Software in the absence of its + Contributions. + + This License does not grant any rights in the trademarks, service marks, or + logos of any Contributor (except as may be necessary to comply with the + notice requirements in Section 3.4). + +2.4. Subsequent Licenses + + No Contributor makes additional grants as a result of Your choice to + distribute the Covered Software under a subsequent version of this License + (see Section 10.2) or under the terms of a Secondary License (if permitted + under the terms of Section 3.3). + +2.5. Representation + + Each Contributor represents that the Contributor believes its Contributions + are its original creation(s) or it has sufficient rights to grant the + rights to its Contributions conveyed by this License. + +2.6. Fair Use + + This License is not intended to limit any rights You have under applicable + copyright doctrines of fair use, fair dealing, or other equivalents. + +2.7. Conditions + + Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in + Section 2.1. + + +3. Responsibilities + +3.1. Distribution of Source Form + + All distribution of Covered Software in Source Code Form, including any + Modifications that You create or to which You contribute, must be under the + terms of this License. You must inform recipients that the Source Code Form + of the Covered Software is governed by the terms of this License, and how + they can obtain a copy of this License. You may not attempt to alter or + restrict the recipients’ rights in the Source Code Form. + +3.2. Distribution of Executable Form + + If You distribute Covered Software in Executable Form then: + + a. such Covered Software must also be made available in Source Code Form, + as described in Section 3.1, and You must inform recipients of the + Executable Form how they can obtain a copy of such Source Code Form by + reasonable means in a timely manner, at a charge no more than the cost + of distribution to the recipient; and + + b. You may distribute such Executable Form under the terms of this License, + or sublicense it under different terms, provided that the license for + the Executable Form does not attempt to limit or alter the recipients’ + rights in the Source Code Form under this License. + +3.3. Distribution of a Larger Work + + You may create and distribute a Larger Work under terms of Your choice, + provided that You also comply with the requirements of this License for the + Covered Software. If the Larger Work is a combination of Covered Software + with a work governed by one or more Secondary Licenses, and the Covered + Software is not Incompatible With Secondary Licenses, this License permits + You to additionally distribute such Covered Software under the terms of + such Secondary License(s), so that the recipient of the Larger Work may, at + their option, further distribute the Covered Software under the terms of + either this License or such Secondary License(s). + +3.4. Notices + + You may not remove or alter the substance of any license notices (including + copyright notices, patent notices, disclaimers of warranty, or limitations + of liability) contained within the Source Code Form of the Covered + Software, except that You may alter any license notices to the extent + required to remedy known factual inaccuracies. + +3.5. Application of Additional Terms + + You may choose to offer, and to charge a fee for, warranty, support, + indemnity or liability obligations to one or more recipients of Covered + Software. However, You may do so only on Your own behalf, and not on behalf + of any Contributor. You must make it absolutely clear that any such + warranty, support, indemnity, or liability obligation is offered by You + alone, and You hereby agree to indemnify every Contributor for any + liability incurred by such Contributor as a result of warranty, support, + indemnity or liability terms You offer. You may include additional + disclaimers of warranty and limitations of liability specific to any + jurisdiction. + +4. Inability to Comply Due to Statute or Regulation + + If it is impossible for You to comply with any of the terms of this License + with respect to some or all of the Covered Software due to statute, judicial + order, or regulation then You must: (a) comply with the terms of this License + to the maximum extent possible; and (b) describe the limitations and the code + they affect. Such description must be placed in a text file included with all + distributions of the Covered Software under this License. Except to the + extent prohibited by statute or regulation, such description must be + sufficiently detailed for a recipient of ordinary skill to be able to + understand it. + +5. Termination + +5.1. The rights granted under this License will terminate automatically if You + fail to comply with any of its terms. However, if You become compliant, + then the rights granted under this License from a particular Contributor + are reinstated (a) provisionally, unless and until such Contributor + explicitly and finally terminates Your grants, and (b) on an ongoing basis, + if such Contributor fails to notify You of the non-compliance by some + reasonable means prior to 60 days after You have come back into compliance. + Moreover, Your grants from a particular Contributor are reinstated on an + ongoing basis if such Contributor notifies You of the non-compliance by + some reasonable means, this is the first time You have received notice of + non-compliance with this License from such Contributor, and You become + compliant prior to 30 days after Your receipt of the notice. + +5.2. If You initiate litigation against any entity by asserting a patent + infringement claim (excluding declaratory judgment actions, counter-claims, + and cross-claims) alleging that a Contributor Version directly or + indirectly infringes any patent, then the rights granted to You by any and + all Contributors for the Covered Software under Section 2.1 of this License + shall terminate. + +5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user + license agreements (excluding distributors and resellers) which have been + validly granted by You or Your distributors under this License prior to + termination shall survive termination. + +6. Disclaimer of Warranty + + Covered Software is provided under this License on an “as is” basis, without + warranty of any kind, either expressed, implied, or statutory, including, + without limitation, warranties that the Covered Software is free of defects, + merchantable, fit for a particular purpose or non-infringing. The entire + risk as to the quality and performance of the Covered Software is with You. + Should any Covered Software prove defective in any respect, You (not any + Contributor) assume the cost of any necessary servicing, repair, or + correction. This disclaimer of warranty constitutes an essential part of this + License. No use of any Covered Software is authorized under this License + except under this disclaimer. + +7. Limitation of Liability + + Under no circumstances and under no legal theory, whether tort (including + negligence), contract, or otherwise, shall any Contributor, or anyone who + distributes Covered Software as permitted above, be liable to You for any + direct, indirect, special, incidental, or consequential damages of any + character including, without limitation, damages for lost profits, loss of + goodwill, work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses, even if such party shall have been + informed of the possibility of such damages. This limitation of liability + shall not apply to liability for death or personal injury resulting from such + party’s negligence to the extent applicable law prohibits such limitation. + Some jurisdictions do not allow the exclusion or limitation of incidental or + consequential damages, so this exclusion and limitation may not apply to You. + +8. Litigation + + Any litigation relating to this License may be brought only in the courts of + a jurisdiction where the defendant maintains its principal place of business + and such litigation shall be governed by laws of that jurisdiction, without + reference to its conflict-of-law provisions. Nothing in this Section shall + prevent a party’s ability to bring cross-claims or counter-claims. + +9. Miscellaneous + + This License represents the complete agreement concerning the subject matter + hereof. If any provision of this License is held to be unenforceable, such + provision shall be reformed only to the extent necessary to make it + enforceable. Any law or regulation which provides that the language of a + contract shall be construed against the drafter shall not be used to construe + this License against a Contributor. + + +10. Versions of the License + +10.1. New Versions + + Mozilla Foundation is the license steward. Except as provided in Section + 10.3, no one other than the license steward has the right to modify or + publish new versions of this License. Each version will be given a + distinguishing version number. + +10.2. Effect of New Versions + + You may distribute the Covered Software under the terms of the version of + the License under which You originally received the Covered Software, or + under the terms of any subsequent version published by the license + steward. + +10.3. Modified Versions + + If you create software not governed by this License, and you want to + create a new license for such software, you may create and use a modified + version of this License if you rename the license and remove any + references to the name of the license steward (except to note that such + modified license differs from this License). + +10.4. Distributing Source Code Form that is Incompatible With Secondary Licenses + If You choose to distribute Source Code Form that is Incompatible With + Secondary Licenses under the terms of this version of the License, the + notice described in Exhibit B of this License must be attached. + +Exhibit A - Source Code Form License Notice + + This Source Code Form is subject to the + terms of the Mozilla Public License, v. + 2.0. If a copy of the MPL was not + distributed with this file, You can + obtain one at + http://mozilla.org/MPL/2.0/. + +If it is not possible or desirable to put the notice in a particular file, then +You may include the notice in a location (such as a LICENSE file in a relevant +directory) where a recipient would be likely to look for such a notice. + +You may add additional accurate notices of copyright ownership. + +Exhibit B - “Incompatible With Secondary Licenses” Notice + + This Source Code Form is “Incompatible + With Secondary Licenses”, as defined by + the Mozilla Public License, v. 2.0. + diff --git a/vendor/github.com/hashicorp/raft/Makefile b/vendor/github.com/hashicorp/raft/Makefile new file mode 100644 index 0000000000..46849d88c0 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/Makefile @@ -0,0 +1,20 @@ +DEPS = $(go list -f '{{range .TestImports}}{{.}} {{end}}' ./...) + +test: + go test -timeout=60s . + +integ: test + INTEG_TESTS=yes go test -timeout=25s -run=Integ . + +fuzz: + go test -timeout=300s ./fuzzy + +deps: + go get -t -d -v ./... + echo $(DEPS) | xargs -n1 go get -d + +cov: + INTEG_TESTS=yes gocov test github.com/hashicorp/raft | gocov-html > /tmp/coverage.html + open /tmp/coverage.html + +.PHONY: test cov integ deps diff --git a/vendor/github.com/hashicorp/raft/README.md b/vendor/github.com/hashicorp/raft/README.md new file mode 100644 index 0000000000..43208ebba8 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/README.md @@ -0,0 +1,107 @@ +raft [![Build Status](https://travis-ci.org/hashicorp/raft.png)](https://travis-ci.org/hashicorp/raft) +==== + +raft is a [Go](http://www.golang.org) library that manages a replicated +log and can be used with an FSM to manage replicated state machines. It +is a library for providing [consensus](http://en.wikipedia.org/wiki/Consensus_(computer_science)). + +The use cases for such a library are far-reaching as replicated state +machines are a key component of many distributed systems. They enable +building Consistent, Partition Tolerant (CP) systems, with limited +fault tolerance as well. + +## Building + +If you wish to build raft you'll need Go version 1.2+ installed. + +Please check your installation with: + +``` +go version +``` + +## Documentation + +For complete documentation, see the associated [Godoc](http://godoc.org/github.com/hashicorp/raft). + +To prevent complications with cgo, the primary backend `MDBStore` is in a separate repository, +called [raft-mdb](http://github.com/hashicorp/raft-mdb). That is the recommended implementation +for the `LogStore` and `StableStore`. + +A pure Go backend using [BoltDB](https://github.com/boltdb/bolt) is also available called +[raft-boltdb](https://github.com/hashicorp/raft-boltdb). It can also be used as a `LogStore` +and `StableStore`. + +## Tagged Releases + +As of September 2017, HashiCorp will start using tags for this library to clearly indicate +major version updates. We recommend you vendor your application's dependency on this library. + +* v0.1.0 is the original stable version of the library that was in master and has been maintained +with no breaking API changes. This was in use by Consul prior to version 0.7.0. + +* v1.0.0 takes the changes that were staged in the library-v2-stage-one branch. This version +manages server identities using a UUID, so introduces some breaking API changes. It also versions +the Raft protocol, and requires some special steps when interoperating with Raft servers running +older versions of the library (see the detailed comment in config.go about version compatibility). +You can reference https://github.com/hashicorp/consul/pull/2222 for an idea of what was required +to port Consul to these new interfaces. + + This version includes some new features as well, including non voting servers, a new address + provider abstraction in the transport layer, and more resilient snapshots. + +## Protocol + +raft is based on ["Raft: In Search of an Understandable Consensus Algorithm"](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf) + +A high level overview of the Raft protocol is described below, but for details please read the full +[Raft paper](https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf) +followed by the raft source. Any questions about the raft protocol should be sent to the +[raft-dev mailing list](https://groups.google.com/forum/#!forum/raft-dev). + +### Protocol Description + +Raft nodes are always in one of three states: follower, candidate or leader. All +nodes initially start out as a follower. In this state, nodes can accept log entries +from a leader and cast votes. If no entries are received for some time, nodes +self-promote to the candidate state. In the candidate state nodes request votes from +their peers. If a candidate receives a quorum of votes, then it is promoted to a leader. +The leader must accept new log entries and replicate to all the other followers. +In addition, if stale reads are not acceptable, all queries must also be performed on +the leader. + +Once a cluster has a leader, it is able to accept new log entries. A client can +request that a leader append a new log entry, which is an opaque binary blob to +Raft. The leader then writes the entry to durable storage and attempts to replicate +to a quorum of followers. Once the log entry is considered *committed*, it can be +*applied* to a finite state machine. The finite state machine is application specific, +and is implemented using an interface. + +An obvious question relates to the unbounded nature of a replicated log. Raft provides +a mechanism by which the current state is snapshotted, and the log is compacted. Because +of the FSM abstraction, restoring the state of the FSM must result in the same state +as a replay of old logs. This allows Raft to capture the FSM state at a point in time, +and then remove all the logs that were used to reach that state. This is performed automatically +without user intervention, and prevents unbounded disk usage as well as minimizing +time spent replaying logs. + +Lastly, there is the issue of updating the peer set when new servers are joining +or existing servers are leaving. As long as a quorum of nodes is available, this +is not an issue as Raft provides mechanisms to dynamically update the peer set. +If a quorum of nodes is unavailable, then this becomes a very challenging issue. +For example, suppose there are only 2 peers, A and B. The quorum size is also +2, meaning both nodes must agree to commit a log entry. If either A or B fails, +it is now impossible to reach quorum. This means the cluster is unable to add, +or remove a node, or commit any additional log entries. This results in *unavailability*. +At this point, manual intervention would be required to remove either A or B, +and to restart the remaining node in bootstrap mode. + +A Raft cluster of 3 nodes can tolerate a single node failure, while a cluster +of 5 can tolerate 2 node failures. The recommended configuration is to either +run 3 or 5 raft servers. This maximizes availability without +greatly sacrificing performance. + +In terms of performance, Raft is comparable to Paxos. Assuming stable leadership, +committing a log entry requires a single round trip to half of the cluster. +Thus performance is bound by disk I/O and network latency. + diff --git a/vendor/github.com/hashicorp/raft/api.go b/vendor/github.com/hashicorp/raft/api.go new file mode 100644 index 0000000000..c6f947f241 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/api.go @@ -0,0 +1,1013 @@ +package raft + +import ( + "errors" + "fmt" + "io" + "os" + "strconv" + "sync" + "time" + + "github.com/hashicorp/go-hclog" + + "github.com/armon/go-metrics" +) + +var ( + // ErrLeader is returned when an operation can't be completed on a + // leader node. + ErrLeader = errors.New("node is the leader") + + // ErrNotLeader is returned when an operation can't be completed on a + // follower or candidate node. + ErrNotLeader = errors.New("node is not the leader") + + // ErrLeadershipLost is returned when a leader fails to commit a log entry + // because it's been deposed in the process. + ErrLeadershipLost = errors.New("leadership lost while committing log") + + // ErrAbortedByRestore is returned when a leader fails to commit a log + // entry because it's been superseded by a user snapshot restore. + ErrAbortedByRestore = errors.New("snapshot restored while committing log") + + // ErrRaftShutdown is returned when operations are requested against an + // inactive Raft. + ErrRaftShutdown = errors.New("raft is already shutdown") + + // ErrEnqueueTimeout is returned when a command fails due to a timeout. + ErrEnqueueTimeout = errors.New("timed out enqueuing operation") + + // ErrNothingNewToSnapshot is returned when trying to create a snapshot + // but there's nothing new commited to the FSM since we started. + ErrNothingNewToSnapshot = errors.New("nothing new to snapshot") + + // ErrUnsupportedProtocol is returned when an operation is attempted + // that's not supported by the current protocol version. + ErrUnsupportedProtocol = errors.New("operation not supported with current protocol version") + + // ErrCantBootstrap is returned when attempt is made to bootstrap a + // cluster that already has state present. + ErrCantBootstrap = errors.New("bootstrap only works on new clusters") +) + +// Raft implements a Raft node. +type Raft struct { + raftState + + // protocolVersion is used to inter-operate with Raft servers running + // different versions of the library. See comments in config.go for more + // details. + protocolVersion ProtocolVersion + + // applyCh is used to async send logs to the main thread to + // be committed and applied to the FSM. + applyCh chan *logFuture + + // Configuration provided at Raft initialization + conf Config + + // FSM is the client state machine to apply commands to + fsm FSM + + // fsmMutateCh is used to send state-changing updates to the FSM. This + // receives pointers to commitTuple structures when applying logs or + // pointers to restoreFuture structures when restoring a snapshot. We + // need control over the order of these operations when doing user + // restores so that we finish applying any old log applies before we + // take a user snapshot on the leader, otherwise we might restore the + // snapshot and apply old logs to it that were in the pipe. + fsmMutateCh chan interface{} + + // fsmSnapshotCh is used to trigger a new snapshot being taken + fsmSnapshotCh chan *reqSnapshotFuture + + // lastContact is the last time we had contact from the + // leader node. This can be used to gauge staleness. + lastContact time.Time + lastContactLock sync.RWMutex + + // Leader is the current cluster leader + leader ServerAddress + leaderLock sync.RWMutex + + // leaderCh is used to notify of leadership changes + leaderCh chan bool + + // leaderState used only while state is leader + leaderState leaderState + + // Stores our local server ID, used to avoid sending RPCs to ourself + localID ServerID + + // Stores our local addr + localAddr ServerAddress + + // Used for our logging + logger hclog.Logger + + // LogStore provides durable storage for logs + logs LogStore + + // Used to request the leader to make configuration changes. + configurationChangeCh chan *configurationChangeFuture + + // Tracks the latest configuration and latest committed configuration from + // the log/snapshot. + configurations configurations + + // RPC chan comes from the transport layer + rpcCh <-chan RPC + + // Shutdown channel to exit, protected to prevent concurrent exits + shutdown bool + shutdownCh chan struct{} + shutdownLock sync.Mutex + + // snapshots is used to store and retrieve snapshots + snapshots SnapshotStore + + // userSnapshotCh is used for user-triggered snapshots + userSnapshotCh chan *userSnapshotFuture + + // userRestoreCh is used for user-triggered restores of external + // snapshots + userRestoreCh chan *userRestoreFuture + + // stable is a StableStore implementation for durable state + // It provides stable storage for many fields in raftState + stable StableStore + + // The transport layer we use + trans Transport + + // verifyCh is used to async send verify futures to the main thread + // to verify we are still the leader + verifyCh chan *verifyFuture + + // configurationsCh is used to get the configuration data safely from + // outside of the main thread. + configurationsCh chan *configurationsFuture + + // bootstrapCh is used to attempt an initial bootstrap from outside of + // the main thread. + bootstrapCh chan *bootstrapFuture + + // List of observers and the mutex that protects them. The observers list + // is indexed by an artificial ID which is used for deregistration. + observersLock sync.RWMutex + observers map[uint64]*Observer +} + +// BootstrapCluster initializes a server's storage with the given cluster +// configuration. This should only be called at the beginning of time for the +// cluster, and you absolutely must make sure that you call it with the same +// configuration on all the Voter servers. There is no need to bootstrap +// Nonvoter and Staging servers. +// +// One sane approach is to bootstrap a single server with a configuration +// listing just itself as a Voter, then invoke AddVoter() on it to add other +// servers to the cluster. +func BootstrapCluster(conf *Config, logs LogStore, stable StableStore, + snaps SnapshotStore, trans Transport, configuration Configuration) error { + // Validate the Raft server config. + if err := ValidateConfig(conf); err != nil { + return err + } + + // Sanity check the Raft peer configuration. + if err := checkConfiguration(configuration); err != nil { + return err + } + + // Make sure the cluster is in a clean state. + hasState, err := HasExistingState(logs, stable, snaps) + if err != nil { + return fmt.Errorf("failed to check for existing state: %v", err) + } + if hasState { + return ErrCantBootstrap + } + + // Set current term to 1. + if err := stable.SetUint64(keyCurrentTerm, 1); err != nil { + return fmt.Errorf("failed to save current term: %v", err) + } + + // Append configuration entry to log. + entry := &Log{ + Index: 1, + Term: 1, + } + if conf.ProtocolVersion < 3 { + entry.Type = LogRemovePeerDeprecated + entry.Data = encodePeers(configuration, trans) + } else { + entry.Type = LogConfiguration + entry.Data = encodeConfiguration(configuration) + } + if err := logs.StoreLog(entry); err != nil { + return fmt.Errorf("failed to append configuration entry to log: %v", err) + } + + return nil +} + +// RecoverCluster is used to manually force a new configuration in order to +// recover from a loss of quorum where the current configuration cannot be +// restored, such as when several servers die at the same time. This works by +// reading all the current state for this server, creating a snapshot with the +// supplied configuration, and then truncating the Raft log. This is the only +// safe way to force a given configuration without actually altering the log to +// insert any new entries, which could cause conflicts with other servers with +// different state. +// +// WARNING! This operation implicitly commits all entries in the Raft log, so +// in general this is an extremely unsafe operation. If you've lost your other +// servers and are performing a manual recovery, then you've also lost the +// commit information, so this is likely the best you can do, but you should be +// aware that calling this can cause Raft log entries that were in the process +// of being replicated but not yet be committed to be committed. +// +// Note the FSM passed here is used for the snapshot operations and will be +// left in a state that should not be used by the application. Be sure to +// discard this FSM and any associated state and provide a fresh one when +// calling NewRaft later. +// +// A typical way to recover the cluster is to shut down all servers and then +// run RecoverCluster on every server using an identical configuration. When +// the cluster is then restarted, and election should occur and then Raft will +// resume normal operation. If it's desired to make a particular server the +// leader, this can be used to inject a new configuration with that server as +// the sole voter, and then join up other new clean-state peer servers using +// the usual APIs in order to bring the cluster back into a known state. +func RecoverCluster(conf *Config, fsm FSM, logs LogStore, stable StableStore, + snaps SnapshotStore, trans Transport, configuration Configuration) error { + // Validate the Raft server config. + if err := ValidateConfig(conf); err != nil { + return err + } + + // Sanity check the Raft peer configuration. + if err := checkConfiguration(configuration); err != nil { + return err + } + + // Refuse to recover if there's no existing state. This would be safe to + // do, but it is likely an indication of an operator error where they + // expect data to be there and it's not. By refusing, we force them + // to show intent to start a cluster fresh by explicitly doing a + // bootstrap, rather than quietly fire up a fresh cluster here. + hasState, err := HasExistingState(logs, stable, snaps) + if err != nil { + return fmt.Errorf("failed to check for existing state: %v", err) + } + if !hasState { + return fmt.Errorf("refused to recover cluster with no initial state, this is probably an operator error") + } + + // Attempt to restore any snapshots we find, newest to oldest. + var snapshotIndex uint64 + var snapshotTerm uint64 + snapshots, err := snaps.List() + if err != nil { + return fmt.Errorf("failed to list snapshots: %v", err) + } + for _, snapshot := range snapshots { + _, source, err := snaps.Open(snapshot.ID) + if err != nil { + // Skip this one and try the next. We will detect if we + // couldn't open any snapshots. + continue + } + defer source.Close() + + if err := fsm.Restore(source); err != nil { + // Same here, skip and try the next one. + continue + } + + snapshotIndex = snapshot.Index + snapshotTerm = snapshot.Term + break + } + if len(snapshots) > 0 && (snapshotIndex == 0 || snapshotTerm == 0) { + return fmt.Errorf("failed to restore any of the available snapshots") + } + + // The snapshot information is the best known end point for the data + // until we play back the Raft log entries. + lastIndex := snapshotIndex + lastTerm := snapshotTerm + + // Apply any Raft log entries past the snapshot. + lastLogIndex, err := logs.LastIndex() + if err != nil { + return fmt.Errorf("failed to find last log: %v", err) + } + for index := snapshotIndex + 1; index <= lastLogIndex; index++ { + var entry Log + if err := logs.GetLog(index, &entry); err != nil { + return fmt.Errorf("failed to get log at index %d: %v", index, err) + } + if entry.Type == LogCommand { + _ = fsm.Apply(&entry) + } + lastIndex = entry.Index + lastTerm = entry.Term + } + + // Create a new snapshot, placing the configuration in as if it was + // committed at index 1. + snapshot, err := fsm.Snapshot() + if err != nil { + return fmt.Errorf("failed to snapshot FSM: %v", err) + } + version := getSnapshotVersion(conf.ProtocolVersion) + sink, err := snaps.Create(version, lastIndex, lastTerm, configuration, 1, trans) + if err != nil { + return fmt.Errorf("failed to create snapshot: %v", err) + } + if err := snapshot.Persist(sink); err != nil { + return fmt.Errorf("failed to persist snapshot: %v", err) + } + if err := sink.Close(); err != nil { + return fmt.Errorf("failed to finalize snapshot: %v", err) + } + + // Compact the log so that we don't get bad interference from any + // configuration change log entries that might be there. + firstLogIndex, err := logs.FirstIndex() + if err != nil { + return fmt.Errorf("failed to get first log index: %v", err) + } + if err := logs.DeleteRange(firstLogIndex, lastLogIndex); err != nil { + return fmt.Errorf("log compaction failed: %v", err) + } + + return nil +} + +// HasExistingState returns true if the server has any existing state (logs, +// knowledge of a current term, or any snapshots). +func HasExistingState(logs LogStore, stable StableStore, snaps SnapshotStore) (bool, error) { + // Make sure we don't have a current term. + currentTerm, err := stable.GetUint64(keyCurrentTerm) + if err == nil { + if currentTerm > 0 { + return true, nil + } + } else { + if err.Error() != "not found" { + return false, fmt.Errorf("failed to read current term: %v", err) + } + } + + // Make sure we have an empty log. + lastIndex, err := logs.LastIndex() + if err != nil { + return false, fmt.Errorf("failed to get last log index: %v", err) + } + if lastIndex > 0 { + return true, nil + } + + // Make sure we have no snapshots + snapshots, err := snaps.List() + if err != nil { + return false, fmt.Errorf("failed to list snapshots: %v", err) + } + if len(snapshots) > 0 { + return true, nil + } + + return false, nil +} + +// NewRaft is used to construct a new Raft node. It takes a configuration, as well +// as implementations of various interfaces that are required. If we have any +// old state, such as snapshots, logs, peers, etc, all those will be restored +// when creating the Raft node. +func NewRaft(conf *Config, fsm FSM, logs LogStore, stable StableStore, snaps SnapshotStore, trans Transport) (*Raft, error) { + // Validate the configuration. + if err := ValidateConfig(conf); err != nil { + return nil, err + } + + // Ensure we have a LogOutput. + var logger hclog.Logger + if conf.Logger != nil { + logger = conf.Logger + } else { + if conf.LogOutput == nil { + conf.LogOutput = os.Stderr + } + + logger = hclog.New(&hclog.LoggerOptions{ + Name: "raft", + Level: hclog.LevelFromString(conf.LogLevel), + Output: conf.LogOutput, + }) + } + + // Try to restore the current term. + currentTerm, err := stable.GetUint64(keyCurrentTerm) + if err != nil && err.Error() != "not found" { + return nil, fmt.Errorf("failed to load current term: %v", err) + } + + // Read the index of the last log entry. + lastIndex, err := logs.LastIndex() + if err != nil { + return nil, fmt.Errorf("failed to find last log: %v", err) + } + + // Get the last log entry. + var lastLog Log + if lastIndex > 0 { + if err = logs.GetLog(lastIndex, &lastLog); err != nil { + return nil, fmt.Errorf("failed to get last log at index %d: %v", lastIndex, err) + } + } + + // Make sure we have a valid server address and ID. + protocolVersion := conf.ProtocolVersion + localAddr := ServerAddress(trans.LocalAddr()) + localID := conf.LocalID + + // TODO (slackpad) - When we deprecate protocol version 2, remove this + // along with the AddPeer() and RemovePeer() APIs. + if protocolVersion < 3 && string(localID) != string(localAddr) { + return nil, fmt.Errorf("when running with ProtocolVersion < 3, LocalID must be set to the network address") + } + + // Create Raft struct. + r := &Raft{ + protocolVersion: protocolVersion, + applyCh: make(chan *logFuture), + conf: *conf, + fsm: fsm, + fsmMutateCh: make(chan interface{}, 128), + fsmSnapshotCh: make(chan *reqSnapshotFuture), + leaderCh: make(chan bool), + localID: localID, + localAddr: localAddr, + logger: logger, + logs: logs, + configurationChangeCh: make(chan *configurationChangeFuture), + configurations: configurations{}, + rpcCh: trans.Consumer(), + snapshots: snaps, + userSnapshotCh: make(chan *userSnapshotFuture), + userRestoreCh: make(chan *userRestoreFuture), + shutdownCh: make(chan struct{}), + stable: stable, + trans: trans, + verifyCh: make(chan *verifyFuture, 64), + configurationsCh: make(chan *configurationsFuture, 8), + bootstrapCh: make(chan *bootstrapFuture), + observers: make(map[uint64]*Observer), + } + + // Initialize as a follower. + r.setState(Follower) + + // Start as leader if specified. This should only be used + // for testing purposes. + if conf.StartAsLeader { + r.setState(Leader) + r.setLeader(r.localAddr) + } + + // Restore the current term and the last log. + r.setCurrentTerm(currentTerm) + r.setLastLog(lastLog.Index, lastLog.Term) + + // Attempt to restore a snapshot if there are any. + if err := r.restoreSnapshot(); err != nil { + return nil, err + } + + // Scan through the log for any configuration change entries. + snapshotIndex, _ := r.getLastSnapshot() + for index := snapshotIndex + 1; index <= lastLog.Index; index++ { + var entry Log + if err := r.logs.GetLog(index, &entry); err != nil { + r.logger.Error(fmt.Sprintf("Failed to get log at %d: %v", index, err)) + panic(err) + } + r.processConfigurationLogEntry(&entry) + } + r.logger.Info(fmt.Sprintf("Initial configuration (index=%d): %+v", + r.configurations.latestIndex, r.configurations.latest.Servers)) + + // Setup a heartbeat fast-path to avoid head-of-line + // blocking where possible. It MUST be safe for this + // to be called concurrently with a blocking RPC. + trans.SetHeartbeatHandler(r.processHeartbeat) + + // Start the background work. + r.goFunc(r.run) + r.goFunc(r.runFSM) + r.goFunc(r.runSnapshots) + return r, nil +} + +// restoreSnapshot attempts to restore the latest snapshots, and fails if none +// of them can be restored. This is called at initialization time, and is +// completely unsafe to call at any other time. +func (r *Raft) restoreSnapshot() error { + snapshots, err := r.snapshots.List() + if err != nil { + r.logger.Error(fmt.Sprintf("Failed to list snapshots: %v", err)) + return err + } + + // Try to load in order of newest to oldest + for _, snapshot := range snapshots { + _, source, err := r.snapshots.Open(snapshot.ID) + if err != nil { + r.logger.Error(fmt.Sprintf("Failed to open snapshot %v: %v", snapshot.ID, err)) + continue + } + defer source.Close() + + if err := r.fsm.Restore(source); err != nil { + r.logger.Error(fmt.Sprintf("Failed to restore snapshot %v: %v", snapshot.ID, err)) + continue + } + + // Log success + r.logger.Info(fmt.Sprintf("Restored from snapshot %v", snapshot.ID)) + + // Update the lastApplied so we don't replay old logs + r.setLastApplied(snapshot.Index) + + // Update the last stable snapshot info + r.setLastSnapshot(snapshot.Index, snapshot.Term) + + // Update the configuration + if snapshot.Version > 0 { + r.configurations.committed = snapshot.Configuration + r.configurations.committedIndex = snapshot.ConfigurationIndex + r.configurations.latest = snapshot.Configuration + r.configurations.latestIndex = snapshot.ConfigurationIndex + } else { + configuration := decodePeers(snapshot.Peers, r.trans) + r.configurations.committed = configuration + r.configurations.committedIndex = snapshot.Index + r.configurations.latest = configuration + r.configurations.latestIndex = snapshot.Index + } + + // Success! + return nil + } + + // If we had snapshots and failed to load them, its an error + if len(snapshots) > 0 { + return fmt.Errorf("failed to load any existing snapshots") + } + return nil +} + +// BootstrapCluster is equivalent to non-member BootstrapCluster but can be +// called on an un-bootstrapped Raft instance after it has been created. This +// should only be called at the beginning of time for the cluster, and you +// absolutely must make sure that you call it with the same configuration on all +// the Voter servers. There is no need to bootstrap Nonvoter and Staging +// servers. +func (r *Raft) BootstrapCluster(configuration Configuration) Future { + bootstrapReq := &bootstrapFuture{} + bootstrapReq.init() + bootstrapReq.configuration = configuration + select { + case <-r.shutdownCh: + return errorFuture{ErrRaftShutdown} + case r.bootstrapCh <- bootstrapReq: + return bootstrapReq + } +} + +// Leader is used to return the current leader of the cluster. +// It may return empty string if there is no current leader +// or the leader is unknown. +func (r *Raft) Leader() ServerAddress { + r.leaderLock.RLock() + leader := r.leader + r.leaderLock.RUnlock() + return leader +} + +// Apply is used to apply a command to the FSM in a highly consistent +// manner. This returns a future that can be used to wait on the application. +// An optional timeout can be provided to limit the amount of time we wait +// for the command to be started. This must be run on the leader or it +// will fail. +func (r *Raft) Apply(cmd []byte, timeout time.Duration) ApplyFuture { + metrics.IncrCounter([]string{"raft", "apply"}, 1) + var timer <-chan time.Time + if timeout > 0 { + timer = time.After(timeout) + } + + // Create a log future, no index or term yet + logFuture := &logFuture{ + log: Log{ + Type: LogCommand, + Data: cmd, + }, + } + logFuture.init() + + select { + case <-timer: + return errorFuture{ErrEnqueueTimeout} + case <-r.shutdownCh: + return errorFuture{ErrRaftShutdown} + case r.applyCh <- logFuture: + return logFuture + } +} + +// Barrier is used to issue a command that blocks until all preceeding +// operations have been applied to the FSM. It can be used to ensure the +// FSM reflects all queued writes. An optional timeout can be provided to +// limit the amount of time we wait for the command to be started. This +// must be run on the leader or it will fail. +func (r *Raft) Barrier(timeout time.Duration) Future { + metrics.IncrCounter([]string{"raft", "barrier"}, 1) + var timer <-chan time.Time + if timeout > 0 { + timer = time.After(timeout) + } + + // Create a log future, no index or term yet + logFuture := &logFuture{ + log: Log{ + Type: LogBarrier, + }, + } + logFuture.init() + + select { + case <-timer: + return errorFuture{ErrEnqueueTimeout} + case <-r.shutdownCh: + return errorFuture{ErrRaftShutdown} + case r.applyCh <- logFuture: + return logFuture + } +} + +// VerifyLeader is used to ensure the current node is still +// the leader. This can be done to prevent stale reads when a +// new leader has potentially been elected. +func (r *Raft) VerifyLeader() Future { + metrics.IncrCounter([]string{"raft", "verify_leader"}, 1) + verifyFuture := &verifyFuture{} + verifyFuture.init() + select { + case <-r.shutdownCh: + return errorFuture{ErrRaftShutdown} + case r.verifyCh <- verifyFuture: + return verifyFuture + } +} + +// GetConfiguration returns the latest configuration and its associated index +// currently in use. This may not yet be committed. This must not be called on +// the main thread (which can access the information directly). +func (r *Raft) GetConfiguration() ConfigurationFuture { + configReq := &configurationsFuture{} + configReq.init() + select { + case <-r.shutdownCh: + configReq.respond(ErrRaftShutdown) + return configReq + case r.configurationsCh <- configReq: + return configReq + } +} + +// AddPeer (deprecated) is used to add a new peer into the cluster. This must be +// run on the leader or it will fail. Use AddVoter/AddNonvoter instead. +func (r *Raft) AddPeer(peer ServerAddress) Future { + if r.protocolVersion > 2 { + return errorFuture{ErrUnsupportedProtocol} + } + + return r.requestConfigChange(configurationChangeRequest{ + command: AddStaging, + serverID: ServerID(peer), + serverAddress: peer, + prevIndex: 0, + }, 0) +} + +// RemovePeer (deprecated) is used to remove a peer from the cluster. If the +// current leader is being removed, it will cause a new election +// to occur. This must be run on the leader or it will fail. +// Use RemoveServer instead. +func (r *Raft) RemovePeer(peer ServerAddress) Future { + if r.protocolVersion > 2 { + return errorFuture{ErrUnsupportedProtocol} + } + + return r.requestConfigChange(configurationChangeRequest{ + command: RemoveServer, + serverID: ServerID(peer), + prevIndex: 0, + }, 0) +} + +// AddVoter will add the given server to the cluster as a staging server. If the +// server is already in the cluster as a voter, this updates the server's address. +// This must be run on the leader or it will fail. The leader will promote the +// staging server to a voter once that server is ready. If nonzero, prevIndex is +// the index of the only configuration upon which this change may be applied; if +// another configuration entry has been added in the meantime, this request will +// fail. If nonzero, timeout is how long this server should wait before the +// configuration change log entry is appended. +func (r *Raft) AddVoter(id ServerID, address ServerAddress, prevIndex uint64, timeout time.Duration) IndexFuture { + if r.protocolVersion < 2 { + return errorFuture{ErrUnsupportedProtocol} + } + + return r.requestConfigChange(configurationChangeRequest{ + command: AddStaging, + serverID: id, + serverAddress: address, + prevIndex: prevIndex, + }, timeout) +} + +// AddNonvoter will add the given server to the cluster but won't assign it a +// vote. The server will receive log entries, but it won't participate in +// elections or log entry commitment. If the server is already in the cluster, +// this updates the server's address. This must be run on the leader or it will +// fail. For prevIndex and timeout, see AddVoter. +func (r *Raft) AddNonvoter(id ServerID, address ServerAddress, prevIndex uint64, timeout time.Duration) IndexFuture { + if r.protocolVersion < 3 { + return errorFuture{ErrUnsupportedProtocol} + } + + return r.requestConfigChange(configurationChangeRequest{ + command: AddNonvoter, + serverID: id, + serverAddress: address, + prevIndex: prevIndex, + }, timeout) +} + +// RemoveServer will remove the given server from the cluster. If the current +// leader is being removed, it will cause a new election to occur. This must be +// run on the leader or it will fail. For prevIndex and timeout, see AddVoter. +func (r *Raft) RemoveServer(id ServerID, prevIndex uint64, timeout time.Duration) IndexFuture { + if r.protocolVersion < 2 { + return errorFuture{ErrUnsupportedProtocol} + } + + return r.requestConfigChange(configurationChangeRequest{ + command: RemoveServer, + serverID: id, + prevIndex: prevIndex, + }, timeout) +} + +// DemoteVoter will take away a server's vote, if it has one. If present, the +// server will continue to receive log entries, but it won't participate in +// elections or log entry commitment. If the server is not in the cluster, this +// does nothing. This must be run on the leader or it will fail. For prevIndex +// and timeout, see AddVoter. +func (r *Raft) DemoteVoter(id ServerID, prevIndex uint64, timeout time.Duration) IndexFuture { + if r.protocolVersion < 3 { + return errorFuture{ErrUnsupportedProtocol} + } + + return r.requestConfigChange(configurationChangeRequest{ + command: DemoteVoter, + serverID: id, + prevIndex: prevIndex, + }, timeout) +} + +// Shutdown is used to stop the Raft background routines. +// This is not a graceful operation. Provides a future that +// can be used to block until all background routines have exited. +func (r *Raft) Shutdown() Future { + r.shutdownLock.Lock() + defer r.shutdownLock.Unlock() + + if !r.shutdown { + close(r.shutdownCh) + r.shutdown = true + r.setState(Shutdown) + return &shutdownFuture{r} + } + + // avoid closing transport twice + return &shutdownFuture{nil} +} + +// Snapshot is used to manually force Raft to take a snapshot. Returns a future +// that can be used to block until complete, and that contains a function that +// can be used to open the snapshot. +func (r *Raft) Snapshot() SnapshotFuture { + future := &userSnapshotFuture{} + future.init() + select { + case r.userSnapshotCh <- future: + return future + case <-r.shutdownCh: + future.respond(ErrRaftShutdown) + return future + } +} + +// Restore is used to manually force Raft to consume an external snapshot, such +// as if restoring from a backup. We will use the current Raft configuration, +// not the one from the snapshot, so that we can restore into a new cluster. We +// will also use the higher of the index of the snapshot, or the current index, +// and then add 1 to that, so we force a new state with a hole in the Raft log, +// so that the snapshot will be sent to followers and used for any new joiners. +// This can only be run on the leader, and blocks until the restore is complete +// or an error occurs. +// +// WARNING! This operation has the leader take on the state of the snapshot and +// then sets itself up so that it replicates that to its followers though the +// install snapshot process. This involves a potentially dangerous period where +// the leader commits ahead of its followers, so should only be used for disaster +// recovery into a fresh cluster, and should not be used in normal operations. +func (r *Raft) Restore(meta *SnapshotMeta, reader io.Reader, timeout time.Duration) error { + metrics.IncrCounter([]string{"raft", "restore"}, 1) + var timer <-chan time.Time + if timeout > 0 { + timer = time.After(timeout) + } + + // Perform the restore. + restore := &userRestoreFuture{ + meta: meta, + reader: reader, + } + restore.init() + select { + case <-timer: + return ErrEnqueueTimeout + case <-r.shutdownCh: + return ErrRaftShutdown + case r.userRestoreCh <- restore: + // If the restore is ingested then wait for it to complete. + if err := restore.Error(); err != nil { + return err + } + } + + // Apply a no-op log entry. Waiting for this allows us to wait until the + // followers have gotten the restore and replicated at least this new + // entry, which shows that we've also faulted and installed the + // snapshot with the contents of the restore. + noop := &logFuture{ + log: Log{ + Type: LogNoop, + }, + } + noop.init() + select { + case <-timer: + return ErrEnqueueTimeout + case <-r.shutdownCh: + return ErrRaftShutdown + case r.applyCh <- noop: + return noop.Error() + } +} + +// State is used to return the current raft state. +func (r *Raft) State() RaftState { + return r.getState() +} + +// LeaderCh is used to get a channel which delivers signals on +// acquiring or losing leadership. It sends true if we become +// the leader, and false if we lose it. The channel is not buffered, +// and does not block on writes. +func (r *Raft) LeaderCh() <-chan bool { + return r.leaderCh +} + +// String returns a string representation of this Raft node. +func (r *Raft) String() string { + return fmt.Sprintf("Node at %s [%v]", r.localAddr, r.getState()) +} + +// LastContact returns the time of last contact by a leader. +// This only makes sense if we are currently a follower. +func (r *Raft) LastContact() time.Time { + r.lastContactLock.RLock() + last := r.lastContact + r.lastContactLock.RUnlock() + return last +} + +// Stats is used to return a map of various internal stats. This +// should only be used for informative purposes or debugging. +// +// Keys are: "state", "term", "last_log_index", "last_log_term", +// "commit_index", "applied_index", "fsm_pending", +// "last_snapshot_index", "last_snapshot_term", +// "latest_configuration", "last_contact", and "num_peers". +// +// The value of "state" is a numerical value representing a +// RaftState const. +// +// The value of "latest_configuration" is a string which contains +// the id of each server, its suffrage status, and its address. +// +// The value of "last_contact" is either "never" if there +// has been no contact with a leader, "0" if the node is in the +// leader state, or the time since last contact with a leader +// formatted as a string. +// +// The value of "num_peers" is the number of other voting servers in the +// cluster, not including this node. If this node isn't part of the +// configuration then this will be "0". +// +// All other values are uint64s, formatted as strings. +func (r *Raft) Stats() map[string]string { + toString := func(v uint64) string { + return strconv.FormatUint(v, 10) + } + lastLogIndex, lastLogTerm := r.getLastLog() + lastSnapIndex, lastSnapTerm := r.getLastSnapshot() + s := map[string]string{ + "state": r.getState().String(), + "term": toString(r.getCurrentTerm()), + "last_log_index": toString(lastLogIndex), + "last_log_term": toString(lastLogTerm), + "commit_index": toString(r.getCommitIndex()), + "applied_index": toString(r.getLastApplied()), + "fsm_pending": toString(uint64(len(r.fsmMutateCh))), + "last_snapshot_index": toString(lastSnapIndex), + "last_snapshot_term": toString(lastSnapTerm), + "protocol_version": toString(uint64(r.protocolVersion)), + "protocol_version_min": toString(uint64(ProtocolVersionMin)), + "protocol_version_max": toString(uint64(ProtocolVersionMax)), + "snapshot_version_min": toString(uint64(SnapshotVersionMin)), + "snapshot_version_max": toString(uint64(SnapshotVersionMax)), + } + + future := r.GetConfiguration() + if err := future.Error(); err != nil { + r.logger.Warn(fmt.Sprintf("could not get configuration for Stats: %v", err)) + } else { + configuration := future.Configuration() + s["latest_configuration_index"] = toString(future.Index()) + s["latest_configuration"] = fmt.Sprintf("%+v", configuration.Servers) + + // This is a legacy metric that we've seen people use in the wild. + hasUs := false + numPeers := 0 + for _, server := range configuration.Servers { + if server.Suffrage == Voter { + if server.ID == r.localID { + hasUs = true + } else { + numPeers++ + } + } + } + if !hasUs { + numPeers = 0 + } + s["num_peers"] = toString(uint64(numPeers)) + } + + last := r.LastContact() + if r.getState() == Leader { + s["last_contact"] = "0" + } else if last.IsZero() { + s["last_contact"] = "never" + } else { + s["last_contact"] = fmt.Sprintf("%v", time.Now().Sub(last)) + } + return s +} + +// LastIndex returns the last index in stable storage, +// either from the last log or from the last snapshot. +func (r *Raft) LastIndex() uint64 { + return r.getLastIndex() +} + +// AppliedIndex returns the last index applied to the FSM. This is generally +// lagging behind the last index, especially for indexes that are persisted but +// have not yet been considered committed by the leader. NOTE - this reflects +// the last index that was sent to the application's FSM over the apply channel +// but DOES NOT mean that the application's FSM has yet consumed it and applied +// it to its internal state. Thus, the application's state may lag behind this +// index. +func (r *Raft) AppliedIndex() uint64 { + return r.getLastApplied() +} diff --git a/vendor/github.com/hashicorp/raft/commands.go b/vendor/github.com/hashicorp/raft/commands.go new file mode 100644 index 0000000000..5d89e7bcdb --- /dev/null +++ b/vendor/github.com/hashicorp/raft/commands.go @@ -0,0 +1,151 @@ +package raft + +// RPCHeader is a common sub-structure used to pass along protocol version and +// other information about the cluster. For older Raft implementations before +// versioning was added this will default to a zero-valued structure when read +// by newer Raft versions. +type RPCHeader struct { + // ProtocolVersion is the version of the protocol the sender is + // speaking. + ProtocolVersion ProtocolVersion +} + +// WithRPCHeader is an interface that exposes the RPC header. +type WithRPCHeader interface { + GetRPCHeader() RPCHeader +} + +// AppendEntriesRequest is the command used to append entries to the +// replicated log. +type AppendEntriesRequest struct { + RPCHeader + + // Provide the current term and leader + Term uint64 + Leader []byte + + // Provide the previous entries for integrity checking + PrevLogEntry uint64 + PrevLogTerm uint64 + + // New entries to commit + Entries []*Log + + // Commit index on the leader + LeaderCommitIndex uint64 +} + +// See WithRPCHeader. +func (r *AppendEntriesRequest) GetRPCHeader() RPCHeader { + return r.RPCHeader +} + +// AppendEntriesResponse is the response returned from an +// AppendEntriesRequest. +type AppendEntriesResponse struct { + RPCHeader + + // Newer term if leader is out of date + Term uint64 + + // Last Log is a hint to help accelerate rebuilding slow nodes + LastLog uint64 + + // We may not succeed if we have a conflicting entry + Success bool + + // There are scenarios where this request didn't succeed + // but there's no need to wait/back-off the next attempt. + NoRetryBackoff bool +} + +// See WithRPCHeader. +func (r *AppendEntriesResponse) GetRPCHeader() RPCHeader { + return r.RPCHeader +} + +// RequestVoteRequest is the command used by a candidate to ask a Raft peer +// for a vote in an election. +type RequestVoteRequest struct { + RPCHeader + + // Provide the term and our id + Term uint64 + Candidate []byte + + // Used to ensure safety + LastLogIndex uint64 + LastLogTerm uint64 +} + +// See WithRPCHeader. +func (r *RequestVoteRequest) GetRPCHeader() RPCHeader { + return r.RPCHeader +} + +// RequestVoteResponse is the response returned from a RequestVoteRequest. +type RequestVoteResponse struct { + RPCHeader + + // Newer term if leader is out of date. + Term uint64 + + // Peers is deprecated, but required by servers that only understand + // protocol version 0. This is not populated in protocol version 2 + // and later. + Peers []byte + + // Is the vote granted. + Granted bool +} + +// See WithRPCHeader. +func (r *RequestVoteResponse) GetRPCHeader() RPCHeader { + return r.RPCHeader +} + +// InstallSnapshotRequest is the command sent to a Raft peer to bootstrap its +// log (and state machine) from a snapshot on another peer. +type InstallSnapshotRequest struct { + RPCHeader + SnapshotVersion SnapshotVersion + + Term uint64 + Leader []byte + + // These are the last index/term included in the snapshot + LastLogIndex uint64 + LastLogTerm uint64 + + // Peer Set in the snapshot. This is deprecated in favor of Configuration + // but remains here in case we receive an InstallSnapshot from a leader + // that's running old code. + Peers []byte + + // Cluster membership. + Configuration []byte + // Log index where 'Configuration' entry was originally written. + ConfigurationIndex uint64 + + // Size of the snapshot + Size int64 +} + +// See WithRPCHeader. +func (r *InstallSnapshotRequest) GetRPCHeader() RPCHeader { + return r.RPCHeader +} + +// InstallSnapshotResponse is the response returned from an +// InstallSnapshotRequest. +type InstallSnapshotResponse struct { + RPCHeader + + Term uint64 + Success bool +} + +// See WithRPCHeader. +func (r *InstallSnapshotResponse) GetRPCHeader() RPCHeader { + return r.RPCHeader +} diff --git a/vendor/github.com/hashicorp/raft/commitment.go b/vendor/github.com/hashicorp/raft/commitment.go new file mode 100644 index 0000000000..7aa36464ae --- /dev/null +++ b/vendor/github.com/hashicorp/raft/commitment.go @@ -0,0 +1,101 @@ +package raft + +import ( + "sort" + "sync" +) + +// Commitment is used to advance the leader's commit index. The leader and +// replication goroutines report in newly written entries with Match(), and +// this notifies on commitCh when the commit index has advanced. +type commitment struct { + // protects matchIndexes and commitIndex + sync.Mutex + // notified when commitIndex increases + commitCh chan struct{} + // voter ID to log index: the server stores up through this log entry + matchIndexes map[ServerID]uint64 + // a quorum stores up through this log entry. monotonically increases. + commitIndex uint64 + // the first index of this leader's term: this needs to be replicated to a + // majority of the cluster before this leader may mark anything committed + // (per Raft's commitment rule) + startIndex uint64 +} + +// newCommitment returns an commitment struct that notifies the provided +// channel when log entries have been committed. A new commitment struct is +// created each time this server becomes leader for a particular term. +// 'configuration' is the servers in the cluster. +// 'startIndex' is the first index created in this term (see +// its description above). +func newCommitment(commitCh chan struct{}, configuration Configuration, startIndex uint64) *commitment { + matchIndexes := make(map[ServerID]uint64) + for _, server := range configuration.Servers { + if server.Suffrage == Voter { + matchIndexes[server.ID] = 0 + } + } + return &commitment{ + commitCh: commitCh, + matchIndexes: matchIndexes, + commitIndex: 0, + startIndex: startIndex, + } +} + +// Called when a new cluster membership configuration is created: it will be +// used to determine commitment from now on. 'configuration' is the servers in +// the cluster. +func (c *commitment) setConfiguration(configuration Configuration) { + c.Lock() + defer c.Unlock() + oldMatchIndexes := c.matchIndexes + c.matchIndexes = make(map[ServerID]uint64) + for _, server := range configuration.Servers { + if server.Suffrage == Voter { + c.matchIndexes[server.ID] = oldMatchIndexes[server.ID] // defaults to 0 + } + } + c.recalculate() +} + +// Called by leader after commitCh is notified +func (c *commitment) getCommitIndex() uint64 { + c.Lock() + defer c.Unlock() + return c.commitIndex +} + +// Match is called once a server completes writing entries to disk: either the +// leader has written the new entry or a follower has replied to an +// AppendEntries RPC. The given server's disk agrees with this server's log up +// through the given index. +func (c *commitment) match(server ServerID, matchIndex uint64) { + c.Lock() + defer c.Unlock() + if prev, hasVote := c.matchIndexes[server]; hasVote && matchIndex > prev { + c.matchIndexes[server] = matchIndex + c.recalculate() + } +} + +// Internal helper to calculate new commitIndex from matchIndexes. +// Must be called with lock held. +func (c *commitment) recalculate() { + if len(c.matchIndexes) == 0 { + return + } + + matched := make([]uint64, 0, len(c.matchIndexes)) + for _, idx := range c.matchIndexes { + matched = append(matched, idx) + } + sort.Sort(uint64Slice(matched)) + quorumMatchIndex := matched[(len(matched)-1)/2] + + if quorumMatchIndex > c.commitIndex && quorumMatchIndex >= c.startIndex { + c.commitIndex = quorumMatchIndex + asyncNotifyCh(c.commitCh) + } +} diff --git a/vendor/github.com/hashicorp/raft/config.go b/vendor/github.com/hashicorp/raft/config.go new file mode 100644 index 0000000000..66d4d0fa08 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/config.go @@ -0,0 +1,265 @@ +package raft + +import ( + "fmt" + "io" + "time" + + "github.com/hashicorp/go-hclog" +) + +// These are the versions of the protocol (which includes RPC messages as +// well as Raft-specific log entries) that this server can _understand_. Use +// the ProtocolVersion member of the Config object to control the version of +// the protocol to use when _speaking_ to other servers. Note that depending on +// the protocol version being spoken, some otherwise understood RPC messages +// may be refused. See dispositionRPC for details of this logic. +// +// There are notes about the upgrade path in the description of the versions +// below. If you are starting a fresh cluster then there's no reason not to +// jump right to the latest protocol version. If you need to interoperate with +// older, version 0 Raft servers you'll need to drive the cluster through the +// different versions in order. +// +// The version details are complicated, but here's a summary of what's required +// to get from a version 0 cluster to version 3: +// +// 1. In version N of your app that starts using the new Raft library with +// versioning, set ProtocolVersion to 1. +// 2. Make version N+1 of your app require version N as a prerequisite (all +// servers must be upgraded). For version N+1 of your app set ProtocolVersion +// to 2. +// 3. Similarly, make version N+2 of your app require version N+1 as a +// prerequisite. For version N+2 of your app, set ProtocolVersion to 3. +// +// During this upgrade, older cluster members will still have Server IDs equal +// to their network addresses. To upgrade an older member and give it an ID, it +// needs to leave the cluster and re-enter: +// +// 1. Remove the server from the cluster with RemoveServer, using its network +// address as its ServerID. +// 2. Update the server's config to use a UUID or something else that is +// not tied to the machine as the ServerID (restarting the server). +// 3. Add the server back to the cluster with AddVoter, using its new ID. +// +// You can do this during the rolling upgrade from N+1 to N+2 of your app, or +// as a rolling change at any time after the upgrade. +// +// Version History +// +// 0: Original Raft library before versioning was added. Servers running this +// version of the Raft library use AddPeerDeprecated/RemovePeerDeprecated +// for all configuration changes, and have no support for LogConfiguration. +// 1: First versioned protocol, used to interoperate with old servers, and begin +// the migration path to newer versions of the protocol. Under this version +// all configuration changes are propagated using the now-deprecated +// RemovePeerDeprecated Raft log entry. This means that server IDs are always +// set to be the same as the server addresses (since the old log entry type +// cannot transmit an ID), and only AddPeer/RemovePeer APIs are supported. +// Servers running this version of the protocol can understand the new +// LogConfiguration Raft log entry but will never generate one so they can +// remain compatible with version 0 Raft servers in the cluster. +// 2: Transitional protocol used when migrating an existing cluster to the new +// server ID system. Server IDs are still set to be the same as server +// addresses, but all configuration changes are propagated using the new +// LogConfiguration Raft log entry type, which can carry full ID information. +// This version supports the old AddPeer/RemovePeer APIs as well as the new +// ID-based AddVoter/RemoveServer APIs which should be used when adding +// version 3 servers to the cluster later. This version sheds all +// interoperability with version 0 servers, but can interoperate with newer +// Raft servers running with protocol version 1 since they can understand the +// new LogConfiguration Raft log entry, and this version can still understand +// their RemovePeerDeprecated Raft log entries. We need this protocol version +// as an intermediate step between 1 and 3 so that servers will propagate the +// ID information that will come from newly-added (or -rolled) servers using +// protocol version 3, but since they are still using their address-based IDs +// from the previous step they will still be able to track commitments and +// their own voting status properly. If we skipped this step, servers would +// be started with their new IDs, but they wouldn't see themselves in the old +// address-based configuration, so none of the servers would think they had a +// vote. +// 3: Protocol adding full support for server IDs and new ID-based server APIs +// (AddVoter, AddNonvoter, etc.), old AddPeer/RemovePeer APIs are no longer +// supported. Version 2 servers should be swapped out by removing them from +// the cluster one-by-one and re-adding them with updated configuration for +// this protocol version, along with their server ID. The remove/add cycle +// is required to populate their server ID. Note that removing must be done +// by ID, which will be the old server's address. +type ProtocolVersion int + +const ( + ProtocolVersionMin ProtocolVersion = 0 + ProtocolVersionMax = 3 +) + +// These are versions of snapshots that this server can _understand_. Currently, +// it is always assumed that this server generates the latest version, though +// this may be changed in the future to include a configurable version. +// +// Version History +// +// 0: Original Raft library before versioning was added. The peers portion of +// these snapshots is encoded in the legacy format which requires decodePeers +// to parse. This version of snapshots should only be produced by the +// unversioned Raft library. +// 1: New format which adds support for a full configuration structure and its +// associated log index, with support for server IDs and non-voting server +// modes. To ease upgrades, this also includes the legacy peers structure but +// that will never be used by servers that understand version 1 snapshots. +// Since the original Raft library didn't enforce any versioning, we must +// include the legacy peers structure for this version, but we can deprecate +// it in the next snapshot version. +type SnapshotVersion int + +const ( + SnapshotVersionMin SnapshotVersion = 0 + SnapshotVersionMax = 1 +) + +// Config provides any necessary configuration for the Raft server. +type Config struct { + // ProtocolVersion allows a Raft server to inter-operate with older + // Raft servers running an older version of the code. This is used to + // version the wire protocol as well as Raft-specific log entries that + // the server uses when _speaking_ to other servers. There is currently + // no auto-negotiation of versions so all servers must be manually + // configured with compatible versions. See ProtocolVersionMin and + // ProtocolVersionMax for the versions of the protocol that this server + // can _understand_. + ProtocolVersion ProtocolVersion + + // HeartbeatTimeout specifies the time in follower state without + // a leader before we attempt an election. + HeartbeatTimeout time.Duration + + // ElectionTimeout specifies the time in candidate state without + // a leader before we attempt an election. + ElectionTimeout time.Duration + + // CommitTimeout controls the time without an Apply() operation + // before we heartbeat to ensure a timely commit. Due to random + // staggering, may be delayed as much as 2x this value. + CommitTimeout time.Duration + + // MaxAppendEntries controls the maximum number of append entries + // to send at once. We want to strike a balance between efficiency + // and avoiding waste if the follower is going to reject because of + // an inconsistent log. + MaxAppendEntries int + + // If we are a member of a cluster, and RemovePeer is invoked for the + // local node, then we forget all peers and transition into the follower state. + // If ShutdownOnRemove is is set, we additional shutdown Raft. Otherwise, + // we can become a leader of a cluster containing only this node. + ShutdownOnRemove bool + + // TrailingLogs controls how many logs we leave after a snapshot. This is + // used so that we can quickly replay logs on a follower instead of being + // forced to send an entire snapshot. + TrailingLogs uint64 + + // SnapshotInterval controls how often we check if we should perform a snapshot. + // We randomly stagger between this value and 2x this value to avoid the entire + // cluster from performing a snapshot at once. + SnapshotInterval time.Duration + + // SnapshotThreshold controls how many outstanding logs there must be before + // we perform a snapshot. This is to prevent excessive snapshots when we can + // just replay a small set of logs. + SnapshotThreshold uint64 + + // LeaderLeaseTimeout is used to control how long the "lease" lasts + // for being the leader without being able to contact a quorum + // of nodes. If we reach this interval without contact, we will + // step down as leader. + LeaderLeaseTimeout time.Duration + + // StartAsLeader forces Raft to start in the leader state. This should + // never be used except for testing purposes, as it can cause a split-brain. + StartAsLeader bool + + // The unique ID for this server across all time. When running with + // ProtocolVersion < 3, you must set this to be the same as the network + // address of your transport. + LocalID ServerID + + // NotifyCh is used to provide a channel that will be notified of leadership + // changes. Raft will block writing to this channel, so it should either be + // buffered or aggressively consumed. + NotifyCh chan<- bool + + // LogOutput is used as a sink for logs, unless Logger is specified. + // Defaults to os.Stderr. + LogOutput io.Writer + + // LogLevel represents a log level. If a no matching string is specified, + // hclog.NoLevel is assumed. + LogLevel string + + // Logger is a user-provided hc-log logger. If nil, a logger writing to + // LogOutput with LogLevel is used. + Logger hclog.Logger +} + +// DefaultConfig returns a Config with usable defaults. +func DefaultConfig() *Config { + return &Config{ + ProtocolVersion: ProtocolVersionMax, + HeartbeatTimeout: 1000 * time.Millisecond, + ElectionTimeout: 1000 * time.Millisecond, + CommitTimeout: 50 * time.Millisecond, + MaxAppendEntries: 64, + ShutdownOnRemove: true, + TrailingLogs: 10240, + SnapshotInterval: 120 * time.Second, + SnapshotThreshold: 8192, + LeaderLeaseTimeout: 500 * time.Millisecond, + LogLevel: "DEBUG", + } +} + +// ValidateConfig is used to validate a sane configuration +func ValidateConfig(config *Config) error { + // We don't actually support running as 0 in the library any more, but + // we do understand it. + protocolMin := ProtocolVersionMin + if protocolMin == 0 { + protocolMin = 1 + } + if config.ProtocolVersion < protocolMin || + config.ProtocolVersion > ProtocolVersionMax { + return fmt.Errorf("Protocol version %d must be >= %d and <= %d", + config.ProtocolVersion, protocolMin, ProtocolVersionMax) + } + if len(config.LocalID) == 0 { + return fmt.Errorf("LocalID cannot be empty") + } + if config.HeartbeatTimeout < 5*time.Millisecond { + return fmt.Errorf("Heartbeat timeout is too low") + } + if config.ElectionTimeout < 5*time.Millisecond { + return fmt.Errorf("Election timeout is too low") + } + if config.CommitTimeout < time.Millisecond { + return fmt.Errorf("Commit timeout is too low") + } + if config.MaxAppendEntries <= 0 { + return fmt.Errorf("MaxAppendEntries must be positive") + } + if config.MaxAppendEntries > 1024 { + return fmt.Errorf("MaxAppendEntries is too large") + } + if config.SnapshotInterval < 5*time.Millisecond { + return fmt.Errorf("Snapshot interval is too low") + } + if config.LeaderLeaseTimeout < 5*time.Millisecond { + return fmt.Errorf("Leader lease timeout is too low") + } + if config.LeaderLeaseTimeout > config.HeartbeatTimeout { + return fmt.Errorf("Leader lease timeout cannot be larger than heartbeat timeout") + } + if config.ElectionTimeout < config.HeartbeatTimeout { + return fmt.Errorf("Election timeout must be equal or greater than Heartbeat Timeout") + } + return nil +} diff --git a/vendor/github.com/hashicorp/raft/configuration.go b/vendor/github.com/hashicorp/raft/configuration.go new file mode 100644 index 0000000000..4bb784d0bf --- /dev/null +++ b/vendor/github.com/hashicorp/raft/configuration.go @@ -0,0 +1,343 @@ +package raft + +import "fmt" + +// ServerSuffrage determines whether a Server in a Configuration gets a vote. +type ServerSuffrage int + +// Note: Don't renumber these, since the numbers are written into the log. +const ( + // Voter is a server whose vote is counted in elections and whose match index + // is used in advancing the leader's commit index. + Voter ServerSuffrage = iota + // Nonvoter is a server that receives log entries but is not considered for + // elections or commitment purposes. + Nonvoter + // Staging is a server that acts like a nonvoter with one exception: once a + // staging server receives enough log entries to be sufficiently caught up to + // the leader's log, the leader will invoke a membership change to change + // the Staging server to a Voter. + Staging +) + +func (s ServerSuffrage) String() string { + switch s { + case Voter: + return "Voter" + case Nonvoter: + return "Nonvoter" + case Staging: + return "Staging" + } + return "ServerSuffrage" +} + +// ServerID is a unique string identifying a server for all time. +type ServerID string + +// ServerAddress is a network address for a server that a transport can contact. +type ServerAddress string + +// Server tracks the information about a single server in a configuration. +type Server struct { + // Suffrage determines whether the server gets a vote. + Suffrage ServerSuffrage + // ID is a unique string identifying this server for all time. + ID ServerID + // Address is its network address that a transport can contact. + Address ServerAddress +} + +// Configuration tracks which servers are in the cluster, and whether they have +// votes. This should include the local server, if it's a member of the cluster. +// The servers are listed no particular order, but each should only appear once. +// These entries are appended to the log during membership changes. +type Configuration struct { + Servers []Server +} + +// Clone makes a deep copy of a Configuration. +func (c *Configuration) Clone() (copy Configuration) { + copy.Servers = append(copy.Servers, c.Servers...) + return +} + +// ConfigurationChangeCommand is the different ways to change the cluster +// configuration. +type ConfigurationChangeCommand uint8 + +const ( + // AddStaging makes a server Staging unless its Voter. + AddStaging ConfigurationChangeCommand = iota + // AddNonvoter makes a server Nonvoter unless its Staging or Voter. + AddNonvoter + // DemoteVoter makes a server Nonvoter unless its absent. + DemoteVoter + // RemoveServer removes a server entirely from the cluster membership. + RemoveServer + // Promote is created automatically by a leader; it turns a Staging server + // into a Voter. + Promote +) + +func (c ConfigurationChangeCommand) String() string { + switch c { + case AddStaging: + return "AddStaging" + case AddNonvoter: + return "AddNonvoter" + case DemoteVoter: + return "DemoteVoter" + case RemoveServer: + return "RemoveServer" + case Promote: + return "Promote" + } + return "ConfigurationChangeCommand" +} + +// configurationChangeRequest describes a change that a leader would like to +// make to its current configuration. It's used only within a single server +// (never serialized into the log), as part of `configurationChangeFuture`. +type configurationChangeRequest struct { + command ConfigurationChangeCommand + serverID ServerID + serverAddress ServerAddress // only present for AddStaging, AddNonvoter + // prevIndex, if nonzero, is the index of the only configuration upon which + // this change may be applied; if another configuration entry has been + // added in the meantime, this request will fail. + prevIndex uint64 +} + +// configurations is state tracked on every server about its Configurations. +// Note that, per Diego's dissertation, there can be at most one uncommitted +// configuration at a time (the next configuration may not be created until the +// prior one has been committed). +// +// One downside to storing just two configurations is that if you try to take a +// snapshot when your state machine hasn't yet applied the committedIndex, we +// have no record of the configuration that would logically fit into that +// snapshot. We disallow snapshots in that case now. An alternative approach, +// which LogCabin uses, is to track every configuration change in the +// log. +type configurations struct { + // committed is the latest configuration in the log/snapshot that has been + // committed (the one with the largest index). + committed Configuration + // committedIndex is the log index where 'committed' was written. + committedIndex uint64 + // latest is the latest configuration in the log/snapshot (may be committed + // or uncommitted) + latest Configuration + // latestIndex is the log index where 'latest' was written. + latestIndex uint64 +} + +// Clone makes a deep copy of a configurations object. +func (c *configurations) Clone() (copy configurations) { + copy.committed = c.committed.Clone() + copy.committedIndex = c.committedIndex + copy.latest = c.latest.Clone() + copy.latestIndex = c.latestIndex + return +} + +// hasVote returns true if the server identified by 'id' is a Voter in the +// provided Configuration. +func hasVote(configuration Configuration, id ServerID) bool { + for _, server := range configuration.Servers { + if server.ID == id { + return server.Suffrage == Voter + } + } + return false +} + +// checkConfiguration tests a cluster membership configuration for common +// errors. +func checkConfiguration(configuration Configuration) error { + idSet := make(map[ServerID]bool) + addressSet := make(map[ServerAddress]bool) + var voters int + for _, server := range configuration.Servers { + if server.ID == "" { + return fmt.Errorf("Empty ID in configuration: %v", configuration) + } + if server.Address == "" { + return fmt.Errorf("Empty address in configuration: %v", server) + } + if idSet[server.ID] { + return fmt.Errorf("Found duplicate ID in configuration: %v", server.ID) + } + idSet[server.ID] = true + if addressSet[server.Address] { + return fmt.Errorf("Found duplicate address in configuration: %v", server.Address) + } + addressSet[server.Address] = true + if server.Suffrage == Voter { + voters++ + } + } + if voters == 0 { + return fmt.Errorf("Need at least one voter in configuration: %v", configuration) + } + return nil +} + +// nextConfiguration generates a new Configuration from the current one and a +// configuration change request. It's split from appendConfigurationEntry so +// that it can be unit tested easily. +func nextConfiguration(current Configuration, currentIndex uint64, change configurationChangeRequest) (Configuration, error) { + if change.prevIndex > 0 && change.prevIndex != currentIndex { + return Configuration{}, fmt.Errorf("Configuration changed since %v (latest is %v)", change.prevIndex, currentIndex) + } + + configuration := current.Clone() + switch change.command { + case AddStaging: + // TODO: barf on new address? + newServer := Server{ + // TODO: This should add the server as Staging, to be automatically + // promoted to Voter later. However, the promotion to Voter is not yet + // implemented, and doing so is not trivial with the way the leader loop + // coordinates with the replication goroutines today. So, for now, the + // server will have a vote right away, and the Promote case below is + // unused. + Suffrage: Voter, + ID: change.serverID, + Address: change.serverAddress, + } + found := false + for i, server := range configuration.Servers { + if server.ID == change.serverID { + if server.Suffrage == Voter { + configuration.Servers[i].Address = change.serverAddress + } else { + configuration.Servers[i] = newServer + } + found = true + break + } + } + if !found { + configuration.Servers = append(configuration.Servers, newServer) + } + case AddNonvoter: + newServer := Server{ + Suffrage: Nonvoter, + ID: change.serverID, + Address: change.serverAddress, + } + found := false + for i, server := range configuration.Servers { + if server.ID == change.serverID { + if server.Suffrage != Nonvoter { + configuration.Servers[i].Address = change.serverAddress + } else { + configuration.Servers[i] = newServer + } + found = true + break + } + } + if !found { + configuration.Servers = append(configuration.Servers, newServer) + } + case DemoteVoter: + for i, server := range configuration.Servers { + if server.ID == change.serverID { + configuration.Servers[i].Suffrage = Nonvoter + break + } + } + case RemoveServer: + for i, server := range configuration.Servers { + if server.ID == change.serverID { + configuration.Servers = append(configuration.Servers[:i], configuration.Servers[i+1:]...) + break + } + } + case Promote: + for i, server := range configuration.Servers { + if server.ID == change.serverID && server.Suffrage == Staging { + configuration.Servers[i].Suffrage = Voter + break + } + } + } + + // Make sure we didn't do something bad like remove the last voter + if err := checkConfiguration(configuration); err != nil { + return Configuration{}, err + } + + return configuration, nil +} + +// encodePeers is used to serialize a Configuration into the old peers format. +// This is here for backwards compatibility when operating with a mix of old +// servers and should be removed once we deprecate support for protocol version 1. +func encodePeers(configuration Configuration, trans Transport) []byte { + // Gather up all the voters, other suffrage types are not supported by + // this data format. + var encPeers [][]byte + for _, server := range configuration.Servers { + if server.Suffrage == Voter { + encPeers = append(encPeers, trans.EncodePeer(server.ID, server.Address)) + } + } + + // Encode the entire array. + buf, err := encodeMsgPack(encPeers) + if err != nil { + panic(fmt.Errorf("failed to encode peers: %v", err)) + } + + return buf.Bytes() +} + +// decodePeers is used to deserialize an old list of peers into a Configuration. +// This is here for backwards compatibility with old log entries and snapshots; +// it should be removed eventually. +func decodePeers(buf []byte, trans Transport) Configuration { + // Decode the buffer first. + var encPeers [][]byte + if err := decodeMsgPack(buf, &encPeers); err != nil { + panic(fmt.Errorf("failed to decode peers: %v", err)) + } + + // Deserialize each peer. + var servers []Server + for _, enc := range encPeers { + p := trans.DecodePeer(enc) + servers = append(servers, Server{ + Suffrage: Voter, + ID: ServerID(p), + Address: ServerAddress(p), + }) + } + + return Configuration{ + Servers: servers, + } +} + +// encodeConfiguration serializes a Configuration using MsgPack, or panics on +// errors. +func encodeConfiguration(configuration Configuration) []byte { + buf, err := encodeMsgPack(configuration) + if err != nil { + panic(fmt.Errorf("failed to encode configuration: %v", err)) + } + return buf.Bytes() +} + +// decodeConfiguration deserializes a Configuration using MsgPack, or panics on +// errors. +func decodeConfiguration(buf []byte) Configuration { + var configuration Configuration + if err := decodeMsgPack(buf, &configuration); err != nil { + panic(fmt.Errorf("failed to decode configuration: %v", err)) + } + return configuration +} diff --git a/vendor/github.com/hashicorp/raft/discard_snapshot.go b/vendor/github.com/hashicorp/raft/discard_snapshot.go new file mode 100644 index 0000000000..5e93a9fe01 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/discard_snapshot.go @@ -0,0 +1,49 @@ +package raft + +import ( + "fmt" + "io" +) + +// DiscardSnapshotStore is used to successfully snapshot while +// always discarding the snapshot. This is useful for when the +// log should be truncated but no snapshot should be retained. +// This should never be used for production use, and is only +// suitable for testing. +type DiscardSnapshotStore struct{} + +type DiscardSnapshotSink struct{} + +// NewDiscardSnapshotStore is used to create a new DiscardSnapshotStore. +func NewDiscardSnapshotStore() *DiscardSnapshotStore { + return &DiscardSnapshotStore{} +} + +func (d *DiscardSnapshotStore) Create(version SnapshotVersion, index, term uint64, + configuration Configuration, configurationIndex uint64, trans Transport) (SnapshotSink, error) { + return &DiscardSnapshotSink{}, nil +} + +func (d *DiscardSnapshotStore) List() ([]*SnapshotMeta, error) { + return nil, nil +} + +func (d *DiscardSnapshotStore) Open(id string) (*SnapshotMeta, io.ReadCloser, error) { + return nil, nil, fmt.Errorf("open is not supported") +} + +func (d *DiscardSnapshotSink) Write(b []byte) (int, error) { + return len(b), nil +} + +func (d *DiscardSnapshotSink) Close() error { + return nil +} + +func (d *DiscardSnapshotSink) ID() string { + return "discard" +} + +func (d *DiscardSnapshotSink) Cancel() error { + return nil +} diff --git a/vendor/github.com/hashicorp/raft/file_snapshot.go b/vendor/github.com/hashicorp/raft/file_snapshot.go new file mode 100644 index 0000000000..ffc9414542 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/file_snapshot.go @@ -0,0 +1,528 @@ +package raft + +import ( + "bufio" + "bytes" + "encoding/json" + "fmt" + "hash" + "hash/crc64" + "io" + "io/ioutil" + "log" + "os" + "path/filepath" + "runtime" + "sort" + "strings" + "time" +) + +const ( + testPath = "permTest" + snapPath = "snapshots" + metaFilePath = "meta.json" + stateFilePath = "state.bin" + tmpSuffix = ".tmp" +) + +// FileSnapshotStore implements the SnapshotStore interface and allows +// snapshots to be made on the local disk. +type FileSnapshotStore struct { + path string + retain int + logger *log.Logger +} + +type snapMetaSlice []*fileSnapshotMeta + +// FileSnapshotSink implements SnapshotSink with a file. +type FileSnapshotSink struct { + store *FileSnapshotStore + logger *log.Logger + dir string + parentDir string + meta fileSnapshotMeta + + stateFile *os.File + stateHash hash.Hash64 + buffered *bufio.Writer + + closed bool +} + +// fileSnapshotMeta is stored on disk. We also put a CRC +// on disk so that we can verify the snapshot. +type fileSnapshotMeta struct { + SnapshotMeta + CRC []byte +} + +// bufferedFile is returned when we open a snapshot. This way +// reads are buffered and the file still gets closed. +type bufferedFile struct { + bh *bufio.Reader + fh *os.File +} + +func (b *bufferedFile) Read(p []byte) (n int, err error) { + return b.bh.Read(p) +} + +func (b *bufferedFile) Close() error { + return b.fh.Close() +} + +// NewFileSnapshotStoreWithLogger creates a new FileSnapshotStore based +// on a base directory. The `retain` parameter controls how many +// snapshots are retained. Must be at least 1. +func NewFileSnapshotStoreWithLogger(base string, retain int, logger *log.Logger) (*FileSnapshotStore, error) { + if retain < 1 { + return nil, fmt.Errorf("must retain at least one snapshot") + } + if logger == nil { + logger = log.New(os.Stderr, "", log.LstdFlags) + } + + // Ensure our path exists + path := filepath.Join(base, snapPath) + if err := os.MkdirAll(path, 0755); err != nil && !os.IsExist(err) { + return nil, fmt.Errorf("snapshot path not accessible: %v", err) + } + + // Setup the store + store := &FileSnapshotStore{ + path: path, + retain: retain, + logger: logger, + } + + // Do a permissions test + if err := store.testPermissions(); err != nil { + return nil, fmt.Errorf("permissions test failed: %v", err) + } + return store, nil +} + +// NewFileSnapshotStore creates a new FileSnapshotStore based +// on a base directory. The `retain` parameter controls how many +// snapshots are retained. Must be at least 1. +func NewFileSnapshotStore(base string, retain int, logOutput io.Writer) (*FileSnapshotStore, error) { + if logOutput == nil { + logOutput = os.Stderr + } + return NewFileSnapshotStoreWithLogger(base, retain, log.New(logOutput, "", log.LstdFlags)) +} + +// testPermissions tries to touch a file in our path to see if it works. +func (f *FileSnapshotStore) testPermissions() error { + path := filepath.Join(f.path, testPath) + fh, err := os.Create(path) + if err != nil { + return err + } + + if err = fh.Close(); err != nil { + return err + } + + if err = os.Remove(path); err != nil { + return err + } + return nil +} + +// snapshotName generates a name for the snapshot. +func snapshotName(term, index uint64) string { + now := time.Now() + msec := now.UnixNano() / int64(time.Millisecond) + return fmt.Sprintf("%d-%d-%d", term, index, msec) +} + +// Create is used to start a new snapshot +func (f *FileSnapshotStore) Create(version SnapshotVersion, index, term uint64, + configuration Configuration, configurationIndex uint64, trans Transport) (SnapshotSink, error) { + // We only support version 1 snapshots at this time. + if version != 1 { + return nil, fmt.Errorf("unsupported snapshot version %d", version) + } + + // Create a new path + name := snapshotName(term, index) + path := filepath.Join(f.path, name+tmpSuffix) + f.logger.Printf("[INFO] snapshot: Creating new snapshot at %s", path) + + // Make the directory + if err := os.MkdirAll(path, 0755); err != nil { + f.logger.Printf("[ERR] snapshot: Failed to make snapshot directory: %v", err) + return nil, err + } + + // Create the sink + sink := &FileSnapshotSink{ + store: f, + logger: f.logger, + dir: path, + parentDir: f.path, + meta: fileSnapshotMeta{ + SnapshotMeta: SnapshotMeta{ + Version: version, + ID: name, + Index: index, + Term: term, + Peers: encodePeers(configuration, trans), + Configuration: configuration, + ConfigurationIndex: configurationIndex, + }, + CRC: nil, + }, + } + + // Write out the meta data + if err := sink.writeMeta(); err != nil { + f.logger.Printf("[ERR] snapshot: Failed to write metadata: %v", err) + return nil, err + } + + // Open the state file + statePath := filepath.Join(path, stateFilePath) + fh, err := os.Create(statePath) + if err != nil { + f.logger.Printf("[ERR] snapshot: Failed to create state file: %v", err) + return nil, err + } + sink.stateFile = fh + + // Create a CRC64 hash + sink.stateHash = crc64.New(crc64.MakeTable(crc64.ECMA)) + + // Wrap both the hash and file in a MultiWriter with buffering + multi := io.MultiWriter(sink.stateFile, sink.stateHash) + sink.buffered = bufio.NewWriter(multi) + + // Done + return sink, nil +} + +// List returns available snapshots in the store. +func (f *FileSnapshotStore) List() ([]*SnapshotMeta, error) { + // Get the eligible snapshots + snapshots, err := f.getSnapshots() + if err != nil { + f.logger.Printf("[ERR] snapshot: Failed to get snapshots: %v", err) + return nil, err + } + + var snapMeta []*SnapshotMeta + for _, meta := range snapshots { + snapMeta = append(snapMeta, &meta.SnapshotMeta) + if len(snapMeta) == f.retain { + break + } + } + return snapMeta, nil +} + +// getSnapshots returns all the known snapshots. +func (f *FileSnapshotStore) getSnapshots() ([]*fileSnapshotMeta, error) { + // Get the eligible snapshots + snapshots, err := ioutil.ReadDir(f.path) + if err != nil { + f.logger.Printf("[ERR] snapshot: Failed to scan snapshot dir: %v", err) + return nil, err + } + + // Populate the metadata + var snapMeta []*fileSnapshotMeta + for _, snap := range snapshots { + // Ignore any files + if !snap.IsDir() { + continue + } + + // Ignore any temporary snapshots + dirName := snap.Name() + if strings.HasSuffix(dirName, tmpSuffix) { + f.logger.Printf("[WARN] snapshot: Found temporary snapshot: %v", dirName) + continue + } + + // Try to read the meta data + meta, err := f.readMeta(dirName) + if err != nil { + f.logger.Printf("[WARN] snapshot: Failed to read metadata for %v: %v", dirName, err) + continue + } + + // Make sure we can understand this version. + if meta.Version < SnapshotVersionMin || meta.Version > SnapshotVersionMax { + f.logger.Printf("[WARN] snapshot: Snapshot version for %v not supported: %d", dirName, meta.Version) + continue + } + + // Append, but only return up to the retain count + snapMeta = append(snapMeta, meta) + } + + // Sort the snapshot, reverse so we get new -> old + sort.Sort(sort.Reverse(snapMetaSlice(snapMeta))) + + return snapMeta, nil +} + +// readMeta is used to read the meta data for a given named backup +func (f *FileSnapshotStore) readMeta(name string) (*fileSnapshotMeta, error) { + // Open the meta file + metaPath := filepath.Join(f.path, name, metaFilePath) + fh, err := os.Open(metaPath) + if err != nil { + return nil, err + } + defer fh.Close() + + // Buffer the file IO + buffered := bufio.NewReader(fh) + + // Read in the JSON + meta := &fileSnapshotMeta{} + dec := json.NewDecoder(buffered) + if err := dec.Decode(meta); err != nil { + return nil, err + } + return meta, nil +} + +// Open takes a snapshot ID and returns a ReadCloser for that snapshot. +func (f *FileSnapshotStore) Open(id string) (*SnapshotMeta, io.ReadCloser, error) { + // Get the metadata + meta, err := f.readMeta(id) + if err != nil { + f.logger.Printf("[ERR] snapshot: Failed to get meta data to open snapshot: %v", err) + return nil, nil, err + } + + // Open the state file + statePath := filepath.Join(f.path, id, stateFilePath) + fh, err := os.Open(statePath) + if err != nil { + f.logger.Printf("[ERR] snapshot: Failed to open state file: %v", err) + return nil, nil, err + } + + // Create a CRC64 hash + stateHash := crc64.New(crc64.MakeTable(crc64.ECMA)) + + // Compute the hash + _, err = io.Copy(stateHash, fh) + if err != nil { + f.logger.Printf("[ERR] snapshot: Failed to read state file: %v", err) + fh.Close() + return nil, nil, err + } + + // Verify the hash + computed := stateHash.Sum(nil) + if bytes.Compare(meta.CRC, computed) != 0 { + f.logger.Printf("[ERR] snapshot: CRC checksum failed (stored: %v computed: %v)", + meta.CRC, computed) + fh.Close() + return nil, nil, fmt.Errorf("CRC mismatch") + } + + // Seek to the start + if _, err := fh.Seek(0, 0); err != nil { + f.logger.Printf("[ERR] snapshot: State file seek failed: %v", err) + fh.Close() + return nil, nil, err + } + + // Return a buffered file + buffered := &bufferedFile{ + bh: bufio.NewReader(fh), + fh: fh, + } + + return &meta.SnapshotMeta, buffered, nil +} + +// ReapSnapshots reaps any snapshots beyond the retain count. +func (f *FileSnapshotStore) ReapSnapshots() error { + snapshots, err := f.getSnapshots() + if err != nil { + f.logger.Printf("[ERR] snapshot: Failed to get snapshots: %v", err) + return err + } + + for i := f.retain; i < len(snapshots); i++ { + path := filepath.Join(f.path, snapshots[i].ID) + f.logger.Printf("[INFO] snapshot: reaping snapshot %v", path) + if err := os.RemoveAll(path); err != nil { + f.logger.Printf("[ERR] snapshot: Failed to reap snapshot %v: %v", path, err) + return err + } + } + return nil +} + +// ID returns the ID of the snapshot, can be used with Open() +// after the snapshot is finalized. +func (s *FileSnapshotSink) ID() string { + return s.meta.ID +} + +// Write is used to append to the state file. We write to the +// buffered IO object to reduce the amount of context switches. +func (s *FileSnapshotSink) Write(b []byte) (int, error) { + return s.buffered.Write(b) +} + +// Close is used to indicate a successful end. +func (s *FileSnapshotSink) Close() error { + // Make sure close is idempotent + if s.closed { + return nil + } + s.closed = true + + // Close the open handles + if err := s.finalize(); err != nil { + s.logger.Printf("[ERR] snapshot: Failed to finalize snapshot: %v", err) + if delErr := os.RemoveAll(s.dir); delErr != nil { + s.logger.Printf("[ERR] snapshot: Failed to delete temporary snapshot directory at path %v: %v", s.dir, delErr) + return delErr + } + return err + } + + // Write out the meta data + if err := s.writeMeta(); err != nil { + s.logger.Printf("[ERR] snapshot: Failed to write metadata: %v", err) + return err + } + + // Move the directory into place + newPath := strings.TrimSuffix(s.dir, tmpSuffix) + if err := os.Rename(s.dir, newPath); err != nil { + s.logger.Printf("[ERR] snapshot: Failed to move snapshot into place: %v", err) + return err + } + + if runtime.GOOS != "windows" { //skipping fsync for directory entry edits on Windows, only needed for *nix style file systems + parentFH, err := os.Open(s.parentDir) + defer parentFH.Close() + if err != nil { + s.logger.Printf("[ERR] snapshot: Failed to open snapshot parent directory %v, error: %v", s.parentDir, err) + return err + } + + if err = parentFH.Sync(); err != nil { + s.logger.Printf("[ERR] snapshot: Failed syncing parent directory %v, error: %v", s.parentDir, err) + return err + } + } + + // Reap any old snapshots + if err := s.store.ReapSnapshots(); err != nil { + return err + } + + return nil +} + +// Cancel is used to indicate an unsuccessful end. +func (s *FileSnapshotSink) Cancel() error { + // Make sure close is idempotent + if s.closed { + return nil + } + s.closed = true + + // Close the open handles + if err := s.finalize(); err != nil { + s.logger.Printf("[ERR] snapshot: Failed to finalize snapshot: %v", err) + return err + } + + // Attempt to remove all artifacts + return os.RemoveAll(s.dir) +} + +// finalize is used to close all of our resources. +func (s *FileSnapshotSink) finalize() error { + // Flush any remaining data + if err := s.buffered.Flush(); err != nil { + return err + } + + // Sync to force fsync to disk + if err := s.stateFile.Sync(); err != nil { + return err + } + + // Get the file size + stat, statErr := s.stateFile.Stat() + + // Close the file + if err := s.stateFile.Close(); err != nil { + return err + } + + // Set the file size, check after we close + if statErr != nil { + return statErr + } + s.meta.Size = stat.Size() + + // Set the CRC + s.meta.CRC = s.stateHash.Sum(nil) + return nil +} + +// writeMeta is used to write out the metadata we have. +func (s *FileSnapshotSink) writeMeta() error { + // Open the meta file + metaPath := filepath.Join(s.dir, metaFilePath) + fh, err := os.Create(metaPath) + if err != nil { + return err + } + defer fh.Close() + + // Buffer the file IO + buffered := bufio.NewWriter(fh) + + // Write out as JSON + enc := json.NewEncoder(buffered) + if err := enc.Encode(&s.meta); err != nil { + return err + } + + if err = buffered.Flush(); err != nil { + return err + } + + if err = fh.Sync(); err != nil { + return err + } + + return nil +} + +// Implement the sort interface for []*fileSnapshotMeta. +func (s snapMetaSlice) Len() int { + return len(s) +} + +func (s snapMetaSlice) Less(i, j int) bool { + if s[i].Term != s[j].Term { + return s[i].Term < s[j].Term + } + if s[i].Index != s[j].Index { + return s[i].Index < s[j].Index + } + return s[i].ID < s[j].ID +} + +func (s snapMetaSlice) Swap(i, j int) { + s[i], s[j] = s[j], s[i] +} diff --git a/vendor/github.com/hashicorp/raft/fsm.go b/vendor/github.com/hashicorp/raft/fsm.go new file mode 100644 index 0000000000..c89986c0fa --- /dev/null +++ b/vendor/github.com/hashicorp/raft/fsm.go @@ -0,0 +1,136 @@ +package raft + +import ( + "fmt" + "io" + "time" + + "github.com/armon/go-metrics" +) + +// FSM provides an interface that can be implemented by +// clients to make use of the replicated log. +type FSM interface { + // Apply log is invoked once a log entry is committed. + // It returns a value which will be made available in the + // ApplyFuture returned by Raft.Apply method if that + // method was called on the same Raft node as the FSM. + Apply(*Log) interface{} + + // Snapshot is used to support log compaction. This call should + // return an FSMSnapshot which can be used to save a point-in-time + // snapshot of the FSM. Apply and Snapshot are not called in multiple + // threads, but Apply will be called concurrently with Persist. This means + // the FSM should be implemented in a fashion that allows for concurrent + // updates while a snapshot is happening. + Snapshot() (FSMSnapshot, error) + + // Restore is used to restore an FSM from a snapshot. It is not called + // concurrently with any other command. The FSM must discard all previous + // state. + Restore(io.ReadCloser) error +} + +// FSMSnapshot is returned by an FSM in response to a Snapshot +// It must be safe to invoke FSMSnapshot methods with concurrent +// calls to Apply. +type FSMSnapshot interface { + // Persist should dump all necessary state to the WriteCloser 'sink', + // and call sink.Close() when finished or call sink.Cancel() on error. + Persist(sink SnapshotSink) error + + // Release is invoked when we are finished with the snapshot. + Release() +} + +// runFSM is a long running goroutine responsible for applying logs +// to the FSM. This is done async of other logs since we don't want +// the FSM to block our internal operations. +func (r *Raft) runFSM() { + var lastIndex, lastTerm uint64 + + commit := func(req *commitTuple) { + // Apply the log if a command + var resp interface{} + if req.log.Type == LogCommand { + start := time.Now() + resp = r.fsm.Apply(req.log) + metrics.MeasureSince([]string{"raft", "fsm", "apply"}, start) + } + + // Update the indexes + lastIndex = req.log.Index + lastTerm = req.log.Term + + // Invoke the future if given + if req.future != nil { + req.future.response = resp + req.future.respond(nil) + } + } + + restore := func(req *restoreFuture) { + // Open the snapshot + meta, source, err := r.snapshots.Open(req.ID) + if err != nil { + req.respond(fmt.Errorf("failed to open snapshot %v: %v", req.ID, err)) + return + } + + // Attempt to restore + start := time.Now() + if err := r.fsm.Restore(source); err != nil { + req.respond(fmt.Errorf("failed to restore snapshot %v: %v", req.ID, err)) + source.Close() + return + } + source.Close() + metrics.MeasureSince([]string{"raft", "fsm", "restore"}, start) + + // Update the last index and term + lastIndex = meta.Index + lastTerm = meta.Term + req.respond(nil) + } + + snapshot := func(req *reqSnapshotFuture) { + // Is there something to snapshot? + if lastIndex == 0 { + req.respond(ErrNothingNewToSnapshot) + return + } + + // Start a snapshot + start := time.Now() + snap, err := r.fsm.Snapshot() + metrics.MeasureSince([]string{"raft", "fsm", "snapshot"}, start) + + // Respond to the request + req.index = lastIndex + req.term = lastTerm + req.snapshot = snap + req.respond(err) + } + + for { + select { + case ptr := <-r.fsmMutateCh: + switch req := ptr.(type) { + case *commitTuple: + commit(req) + + case *restoreFuture: + restore(req) + + default: + panic(fmt.Errorf("bad type passed to fsmMutateCh: %#v", ptr)) + } + + case req := <-r.fsmSnapshotCh: + snapshot(req) + + case <-r.shutdownCh: + return + } + } +} diff --git a/vendor/github.com/hashicorp/raft/future.go b/vendor/github.com/hashicorp/raft/future.go new file mode 100644 index 0000000000..fac59a5cc4 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/future.go @@ -0,0 +1,289 @@ +package raft + +import ( + "fmt" + "io" + "sync" + "time" +) + +// Future is used to represent an action that may occur in the future. +type Future interface { + // Error blocks until the future arrives and then + // returns the error status of the future. + // This may be called any number of times - all + // calls will return the same value. + // Note that it is not OK to call this method + // twice concurrently on the same Future instance. + Error() error +} + +// IndexFuture is used for future actions that can result in a raft log entry +// being created. +type IndexFuture interface { + Future + + // Index holds the index of the newly applied log entry. + // This must not be called until after the Error method has returned. + Index() uint64 +} + +// ApplyFuture is used for Apply and can return the FSM response. +type ApplyFuture interface { + IndexFuture + + // Response returns the FSM response as returned + // by the FSM.Apply method. This must not be called + // until after the Error method has returned. + Response() interface{} +} + +// ConfigurationFuture is used for GetConfiguration and can return the +// latest configuration in use by Raft. +type ConfigurationFuture interface { + IndexFuture + + // Configuration contains the latest configuration. This must + // not be called until after the Error method has returned. + Configuration() Configuration +} + +// SnapshotFuture is used for waiting on a user-triggered snapshot to complete. +type SnapshotFuture interface { + Future + + // Open is a function you can call to access the underlying snapshot and + // its metadata. This must not be called until after the Error method + // has returned. + Open() (*SnapshotMeta, io.ReadCloser, error) +} + +// errorFuture is used to return a static error. +type errorFuture struct { + err error +} + +func (e errorFuture) Error() error { + return e.err +} + +func (e errorFuture) Response() interface{} { + return nil +} + +func (e errorFuture) Index() uint64 { + return 0 +} + +// deferError can be embedded to allow a future +// to provide an error in the future. +type deferError struct { + err error + errCh chan error + responded bool +} + +func (d *deferError) init() { + d.errCh = make(chan error, 1) +} + +func (d *deferError) Error() error { + if d.err != nil { + // Note that when we've received a nil error, this + // won't trigger, but the channel is closed after + // send so we'll still return nil below. + return d.err + } + if d.errCh == nil { + panic("waiting for response on nil channel") + } + d.err = <-d.errCh + return d.err +} + +func (d *deferError) respond(err error) { + if d.errCh == nil { + return + } + if d.responded { + return + } + d.errCh <- err + close(d.errCh) + d.responded = true +} + +// There are several types of requests that cause a configuration entry to +// be appended to the log. These are encoded here for leaderLoop() to process. +// This is internal to a single server. +type configurationChangeFuture struct { + logFuture + req configurationChangeRequest +} + +// bootstrapFuture is used to attempt a live bootstrap of the cluster. See the +// Raft object's BootstrapCluster member function for more details. +type bootstrapFuture struct { + deferError + + // configuration is the proposed bootstrap configuration to apply. + configuration Configuration +} + +// logFuture is used to apply a log entry and waits until +// the log is considered committed. +type logFuture struct { + deferError + log Log + response interface{} + dispatch time.Time +} + +func (l *logFuture) Response() interface{} { + return l.response +} + +func (l *logFuture) Index() uint64 { + return l.log.Index +} + +type shutdownFuture struct { + raft *Raft +} + +func (s *shutdownFuture) Error() error { + if s.raft == nil { + return nil + } + s.raft.waitShutdown() + if closeable, ok := s.raft.trans.(WithClose); ok { + closeable.Close() + } + return nil +} + +// userSnapshotFuture is used for waiting on a user-triggered snapshot to +// complete. +type userSnapshotFuture struct { + deferError + + // opener is a function used to open the snapshot. This is filled in + // once the future returns with no error. + opener func() (*SnapshotMeta, io.ReadCloser, error) +} + +// Open is a function you can call to access the underlying snapshot and its +// metadata. +func (u *userSnapshotFuture) Open() (*SnapshotMeta, io.ReadCloser, error) { + if u.opener == nil { + return nil, nil, fmt.Errorf("no snapshot available") + } else { + // Invalidate the opener so it can't get called multiple times, + // which isn't generally safe. + defer func() { + u.opener = nil + }() + return u.opener() + } +} + +// userRestoreFuture is used for waiting on a user-triggered restore of an +// external snapshot to complete. +type userRestoreFuture struct { + deferError + + // meta is the metadata that belongs with the snapshot. + meta *SnapshotMeta + + // reader is the interface to read the snapshot contents from. + reader io.Reader +} + +// reqSnapshotFuture is used for requesting a snapshot start. +// It is only used internally. +type reqSnapshotFuture struct { + deferError + + // snapshot details provided by the FSM runner before responding + index uint64 + term uint64 + snapshot FSMSnapshot +} + +// restoreFuture is used for requesting an FSM to perform a +// snapshot restore. Used internally only. +type restoreFuture struct { + deferError + ID string +} + +// verifyFuture is used to verify the current node is still +// the leader. This is to prevent a stale read. +type verifyFuture struct { + deferError + notifyCh chan *verifyFuture + quorumSize int + votes int + voteLock sync.Mutex +} + +// configurationsFuture is used to retrieve the current configurations. This is +// used to allow safe access to this information outside of the main thread. +type configurationsFuture struct { + deferError + configurations configurations +} + +// Configuration returns the latest configuration in use by Raft. +func (c *configurationsFuture) Configuration() Configuration { + return c.configurations.latest +} + +// Index returns the index of the latest configuration in use by Raft. +func (c *configurationsFuture) Index() uint64 { + return c.configurations.latestIndex +} + +// vote is used to respond to a verifyFuture. +// This may block when responding on the notifyCh. +func (v *verifyFuture) vote(leader bool) { + v.voteLock.Lock() + defer v.voteLock.Unlock() + + // Guard against having notified already + if v.notifyCh == nil { + return + } + + if leader { + v.votes++ + if v.votes >= v.quorumSize { + v.notifyCh <- v + v.notifyCh = nil + } + } else { + v.notifyCh <- v + v.notifyCh = nil + } +} + +// appendFuture is used for waiting on a pipelined append +// entries RPC. +type appendFuture struct { + deferError + start time.Time + args *AppendEntriesRequest + resp *AppendEntriesResponse +} + +func (a *appendFuture) Start() time.Time { + return a.start +} + +func (a *appendFuture) Request() *AppendEntriesRequest { + return a.args +} + +func (a *appendFuture) Response() *AppendEntriesResponse { + return a.resp +} diff --git a/vendor/github.com/hashicorp/raft/go.mod b/vendor/github.com/hashicorp/raft/go.mod new file mode 100644 index 0000000000..09803b688f --- /dev/null +++ b/vendor/github.com/hashicorp/raft/go.mod @@ -0,0 +1,10 @@ +module github.com/hashicorp/raft + +go 1.12 + +require ( + github.com/armon/go-metrics v0.0.0-20190430140413-ec5e00d3c878 + github.com/hashicorp/go-hclog v0.9.1 + github.com/hashicorp/go-msgpack v0.5.5 + github.com/stretchr/testify v1.3.0 +) diff --git a/vendor/github.com/hashicorp/raft/go.sum b/vendor/github.com/hashicorp/raft/go.sum new file mode 100644 index 0000000000..b06b6a7a4f --- /dev/null +++ b/vendor/github.com/hashicorp/raft/go.sum @@ -0,0 +1,37 @@ +github.com/DataDog/datadog-go v2.2.0+incompatible/go.mod h1:LButxg5PwREeZtORoXG3tL4fMGNddJ+vMq1mwgfaqoQ= +github.com/armon/go-metrics v0.0.0-20190430140413-ec5e00d3c878 h1:EFSB7Zo9Eg91v7MJPVsifUysc/wPdN+NOnVe6bWbdBM= +github.com/armon/go-metrics v0.0.0-20190430140413-ec5e00d3c878/go.mod h1:3AMJUQhVx52RsWOnlkpikZr01T/yAVN2gn0861vByNg= +github.com/beorn7/perks v0.0.0-20180321164747-3a771d992973/go.mod h1:Dwedo/Wpr24TaqPxmxbtue+5NUziq4I4S80YR8gNf3Q= +github.com/circonus-labs/circonus-gometrics v2.3.1+incompatible/go.mod h1:nmEj6Dob7S7YxXgwXpfOuvO54S+tGdZdw9fuRZt25Ag= +github.com/circonus-labs/circonusllhist v0.1.3/go.mod h1:kMXHVDlOchFAehlya5ePtbp5jckzBHf4XRpQvBOLI+I= +github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= +github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= +github.com/hashicorp/go-cleanhttp v0.5.0/go.mod h1:JpRdi6/HCYpAwUzNwuwqhbovhLtngrth3wmdIIUrZ80= +github.com/hashicorp/go-hclog v0.9.1 h1:9PZfAcVEvez4yhLH2TBU64/h/z4xlFI80cWXRrxuKuM= +github.com/hashicorp/go-hclog v0.9.1/go.mod h1:5CU+agLiy3J7N7QjHK5d05KxGsuXiQLrjA0H7acj2lQ= +github.com/hashicorp/go-immutable-radix v1.0.0 h1:AKDB1HM5PWEA7i4nhcpwOrO2byshxBjXVn/J/3+z5/0= +github.com/hashicorp/go-immutable-radix v1.0.0/go.mod h1:0y9vanUI8NX6FsYoO3zeMjhV/C5i9g4Q3DwcSNZ4P60= +github.com/hashicorp/go-msgpack v0.5.5 h1:i9R9JSrqIz0QVLz3sz+i3YJdT7TTSLcfLLzJi9aZTuI= +github.com/hashicorp/go-msgpack v0.5.5/go.mod h1:ahLV/dePpqEmjfWmKiqvPkv/twdG7iPBM1vqhUKIvfM= +github.com/hashicorp/go-retryablehttp v0.5.3/go.mod h1:9B5zBasrRhHXnJnui7y6sL7es7NDiJgTc6Er0maI1Xs= +github.com/hashicorp/go-uuid v1.0.0/go.mod h1:6SBZvOh/SIDV7/2o3Jml5SYk/TvGqwFJ/bN7x4byOro= +github.com/hashicorp/golang-lru v0.5.0 h1:CL2msUPvZTLb5O648aiLNJw3hnBxN2+1Jq8rCOH9wdo= +github.com/hashicorp/golang-lru v0.5.0/go.mod h1:/m3WP610KZHVQ1SGc6re/UDhFvYD7pJ4Ao+sR/qLZy8= +github.com/matttproud/golang_protobuf_extensions v1.0.1/go.mod h1:D8He9yQNgCq6Z5Ld7szi9bcBfOoFv/3dc6xSMkL2PC0= +github.com/pascaldekloe/goe v0.1.0/go.mod h1:lzWF7FIEvWOWxwDKqyGYQf6ZUaNfKdP144TG7ZOy1lc= +github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= +github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= +github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/prometheus/client_golang v0.9.2/go.mod h1:OsXs2jCmiKlQ1lTBmv21f2mNfw4xf/QclQDMrYNZzcM= +github.com/prometheus/client_model v0.0.0-20180712105110-5c3871d89910/go.mod h1:MbSGuTsp3dbXC40dX6PRTWyKYBIrTGTE9sqQNg2J8bo= +github.com/prometheus/common v0.0.0-20181126121408-4724e9255275/go.mod h1:daVV7qP5qjZbuso7PdcryaAu0sAZbrN9i7WWcTMWvro= +github.com/prometheus/procfs v0.0.0-20181204211112-1dc9a6cbc91a/go.mod h1:c3At6R/oaqEKCNdg8wHV1ftS6bRYblBhIjjI8uT2IGk= +github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= +github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs= +github.com/stretchr/testify v1.3.0 h1:TivCn/peBQ7UY8ooIcPgZFpTNSz0Q2U6UrFlUfqbe0Q= +github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= +github.com/tv42/httpunix v0.0.0-20150427012821-b75d8614f926/go.mod h1:9ESjWnEqriFuLhtthL60Sar/7RFoluCcXsuvEwTV5KM= +golang.org/x/net v0.0.0-20181201002055-351d144fa1fc/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= +golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= diff --git a/vendor/github.com/hashicorp/raft/inmem_snapshot.go b/vendor/github.com/hashicorp/raft/inmem_snapshot.go new file mode 100644 index 0000000000..ad52f93aef --- /dev/null +++ b/vendor/github.com/hashicorp/raft/inmem_snapshot.go @@ -0,0 +1,109 @@ +package raft + +import ( + "bytes" + "fmt" + "io" + "io/ioutil" + "sync" +) + +// InmemSnapshotStore implements the SnapshotStore interface and +// retains only the most recent snapshot +type InmemSnapshotStore struct { + latest *InmemSnapshotSink + hasSnapshot bool + sync.RWMutex +} + +// InmemSnapshotSink implements SnapshotSink in memory +type InmemSnapshotSink struct { + meta SnapshotMeta + contents *bytes.Buffer +} + +// NewInmemSnapshotStore creates a blank new InmemSnapshotStore +func NewInmemSnapshotStore() *InmemSnapshotStore { + return &InmemSnapshotStore{ + latest: &InmemSnapshotSink{ + contents: &bytes.Buffer{}, + }, + } +} + +// Create replaces the stored snapshot with a new one using the given args +func (m *InmemSnapshotStore) Create(version SnapshotVersion, index, term uint64, + configuration Configuration, configurationIndex uint64, trans Transport) (SnapshotSink, error) { + // We only support version 1 snapshots at this time. + if version != 1 { + return nil, fmt.Errorf("unsupported snapshot version %d", version) + } + + name := snapshotName(term, index) + + m.Lock() + defer m.Unlock() + + sink := &InmemSnapshotSink{ + meta: SnapshotMeta{ + Version: version, + ID: name, + Index: index, + Term: term, + Peers: encodePeers(configuration, trans), + Configuration: configuration, + ConfigurationIndex: configurationIndex, + }, + contents: &bytes.Buffer{}, + } + m.hasSnapshot = true + m.latest = sink + + return sink, nil +} + +// List returns the latest snapshot taken +func (m *InmemSnapshotStore) List() ([]*SnapshotMeta, error) { + m.RLock() + defer m.RUnlock() + + if !m.hasSnapshot { + return []*SnapshotMeta{}, nil + } + return []*SnapshotMeta{&m.latest.meta}, nil +} + +// Open wraps an io.ReadCloser around the snapshot contents +func (m *InmemSnapshotStore) Open(id string) (*SnapshotMeta, io.ReadCloser, error) { + m.RLock() + defer m.RUnlock() + + if m.latest.meta.ID != id { + return nil, nil, fmt.Errorf("[ERR] snapshot: failed to open snapshot id: %s", id) + } + + // Make a copy of the contents, since a bytes.Buffer can only be read + // once. + contents := bytes.NewBuffer(m.latest.contents.Bytes()) + return &m.latest.meta, ioutil.NopCloser(contents), nil +} + +// Write appends the given bytes to the snapshot contents +func (s *InmemSnapshotSink) Write(p []byte) (n int, err error) { + written, err := io.Copy(s.contents, bytes.NewReader(p)) + s.meta.Size += written + return int(written), err +} + +// Close updates the Size and is otherwise a no-op +func (s *InmemSnapshotSink) Close() error { + return nil +} + +func (s *InmemSnapshotSink) ID() string { + return s.meta.ID +} + +func (s *InmemSnapshotSink) Cancel() error { + return nil +} diff --git a/vendor/github.com/hashicorp/raft/inmem_store.go b/vendor/github.com/hashicorp/raft/inmem_store.go new file mode 100644 index 0000000000..6285610f9a --- /dev/null +++ b/vendor/github.com/hashicorp/raft/inmem_store.go @@ -0,0 +1,130 @@ +package raft + +import ( + "errors" + "sync" +) + +// InmemStore implements the LogStore and StableStore interface. +// It should NOT EVER be used for production. It is used only for +// unit tests. Use the MDBStore implementation instead. +type InmemStore struct { + l sync.RWMutex + lowIndex uint64 + highIndex uint64 + logs map[uint64]*Log + kv map[string][]byte + kvInt map[string]uint64 +} + +// NewInmemStore returns a new in-memory backend. Do not ever +// use for production. Only for testing. +func NewInmemStore() *InmemStore { + i := &InmemStore{ + logs: make(map[uint64]*Log), + kv: make(map[string][]byte), + kvInt: make(map[string]uint64), + } + return i +} + +// FirstIndex implements the LogStore interface. +func (i *InmemStore) FirstIndex() (uint64, error) { + i.l.RLock() + defer i.l.RUnlock() + return i.lowIndex, nil +} + +// LastIndex implements the LogStore interface. +func (i *InmemStore) LastIndex() (uint64, error) { + i.l.RLock() + defer i.l.RUnlock() + return i.highIndex, nil +} + +// GetLog implements the LogStore interface. +func (i *InmemStore) GetLog(index uint64, log *Log) error { + i.l.RLock() + defer i.l.RUnlock() + l, ok := i.logs[index] + if !ok { + return ErrLogNotFound + } + *log = *l + return nil +} + +// StoreLog implements the LogStore interface. +func (i *InmemStore) StoreLog(log *Log) error { + return i.StoreLogs([]*Log{log}) +} + +// StoreLogs implements the LogStore interface. +func (i *InmemStore) StoreLogs(logs []*Log) error { + i.l.Lock() + defer i.l.Unlock() + for _, l := range logs { + i.logs[l.Index] = l + if i.lowIndex == 0 { + i.lowIndex = l.Index + } + if l.Index > i.highIndex { + i.highIndex = l.Index + } + } + return nil +} + +// DeleteRange implements the LogStore interface. +func (i *InmemStore) DeleteRange(min, max uint64) error { + i.l.Lock() + defer i.l.Unlock() + for j := min; j <= max; j++ { + delete(i.logs, j) + } + if min <= i.lowIndex { + i.lowIndex = max + 1 + } + if max >= i.highIndex { + i.highIndex = min - 1 + } + if i.lowIndex > i.highIndex { + i.lowIndex = 0 + i.highIndex = 0 + } + return nil +} + +// Set implements the StableStore interface. +func (i *InmemStore) Set(key []byte, val []byte) error { + i.l.Lock() + defer i.l.Unlock() + i.kv[string(key)] = val + return nil +} + +// Get implements the StableStore interface. +func (i *InmemStore) Get(key []byte) ([]byte, error) { + i.l.RLock() + defer i.l.RUnlock() + val := i.kv[string(key)] + if val == nil { + return nil, errors.New("not found") + } + return val, nil +} + +// SetUint64 implements the StableStore interface. +func (i *InmemStore) SetUint64(key []byte, val uint64) error { + i.l.Lock() + defer i.l.Unlock() + i.kvInt[string(key)] = val + return nil +} + +// GetUint64 implements the StableStore interface. +func (i *InmemStore) GetUint64(key []byte) (uint64, error) { + i.l.RLock() + defer i.l.RUnlock() + return i.kvInt[string(key)], nil +} diff --git a/vendor/github.com/hashicorp/raft/inmem_transport.go b/vendor/github.com/hashicorp/raft/inmem_transport.go new file mode 100644 index 0000000000..bb42eeb68b --- /dev/null +++ b/vendor/github.com/hashicorp/raft/inmem_transport.go @@ -0,0 +1,335 @@ +package raft + +import ( + "fmt" + "io" + "sync" + "time" +) + +// NewInmemAddr returns a new in-memory addr with +// a randomly generate UUID as the ID. +func NewInmemAddr() ServerAddress { + return ServerAddress(generateUUID()) +} + +// inmemPipeline is used to pipeline requests for the in-mem transport. +type inmemPipeline struct { + trans *InmemTransport + peer *InmemTransport + peerAddr ServerAddress + + doneCh chan AppendFuture + inprogressCh chan *inmemPipelineInflight + + shutdown bool + shutdownCh chan struct{} + shutdownLock sync.Mutex +} + +type inmemPipelineInflight struct { + future *appendFuture + respCh <-chan RPCResponse +} + +// InmemTransport Implements the Transport interface, to allow Raft to be +// tested in-memory without going over a network. +type InmemTransport struct { + sync.RWMutex + consumerCh chan RPC + localAddr ServerAddress + peers map[ServerAddress]*InmemTransport + pipelines []*inmemPipeline + timeout time.Duration +} + +// NewInmemTransportWithTimeout is used to initialize a new transport and +// generates a random local address if none is specified. The given timeout +// will be used to decide how long to wait for a connected peer to process the +// RPCs that we're sending it. See also Connect() and Consumer(). +func NewInmemTransportWithTimeout(addr ServerAddress, timeout time.Duration) (ServerAddress, *InmemTransport) { + if string(addr) == "" { + addr = NewInmemAddr() + } + trans := &InmemTransport{ + consumerCh: make(chan RPC, 16), + localAddr: addr, + peers: make(map[ServerAddress]*InmemTransport), + timeout: timeout, + } + return addr, trans +} + +// NewInmemTransport is used to initialize a new transport +// and generates a random local address if none is specified +func NewInmemTransport(addr ServerAddress) (ServerAddress, *InmemTransport) { + return NewInmemTransportWithTimeout(addr, 50*time.Millisecond) +} + +// SetHeartbeatHandler is used to set optional fast-path for +// heartbeats, not supported for this transport. +func (i *InmemTransport) SetHeartbeatHandler(cb func(RPC)) { +} + +// Consumer implements the Transport interface. +func (i *InmemTransport) Consumer() <-chan RPC { + return i.consumerCh +} + +// LocalAddr implements the Transport interface. +func (i *InmemTransport) LocalAddr() ServerAddress { + return i.localAddr +} + +// AppendEntriesPipeline returns an interface that can be used to pipeline +// AppendEntries requests. +func (i *InmemTransport) AppendEntriesPipeline(id ServerID, target ServerAddress) (AppendPipeline, error) { + i.Lock() + defer i.Unlock() + + peer, ok := i.peers[target] + if !ok { + return nil, fmt.Errorf("failed to connect to peer: %v", target) + } + pipeline := newInmemPipeline(i, peer, target) + i.pipelines = append(i.pipelines, pipeline) + return pipeline, nil +} + +// AppendEntries implements the Transport interface. +func (i *InmemTransport) AppendEntries(id ServerID, target ServerAddress, args *AppendEntriesRequest, resp *AppendEntriesResponse) error { + rpcResp, err := i.makeRPC(target, args, nil, i.timeout) + if err != nil { + return err + } + + // Copy the result back + out := rpcResp.Response.(*AppendEntriesResponse) + *resp = *out + return nil +} + +// RequestVote implements the Transport interface. +func (i *InmemTransport) RequestVote(id ServerID, target ServerAddress, args *RequestVoteRequest, resp *RequestVoteResponse) error { + rpcResp, err := i.makeRPC(target, args, nil, i.timeout) + if err != nil { + return err + } + + // Copy the result back + out := rpcResp.Response.(*RequestVoteResponse) + *resp = *out + return nil +} + +// InstallSnapshot implements the Transport interface. +func (i *InmemTransport) InstallSnapshot(id ServerID, target ServerAddress, args *InstallSnapshotRequest, resp *InstallSnapshotResponse, data io.Reader) error { + rpcResp, err := i.makeRPC(target, args, data, 10*i.timeout) + if err != nil { + return err + } + + // Copy the result back + out := rpcResp.Response.(*InstallSnapshotResponse) + *resp = *out + return nil +} + +func (i *InmemTransport) makeRPC(target ServerAddress, args interface{}, r io.Reader, timeout time.Duration) (rpcResp RPCResponse, err error) { + i.RLock() + peer, ok := i.peers[target] + i.RUnlock() + + if !ok { + err = fmt.Errorf("failed to connect to peer: %v", target) + return + } + + // Send the RPC over + respCh := make(chan RPCResponse) + req := RPC{ + Command: args, + Reader: r, + RespChan: respCh, + } + select { + case peer.consumerCh <- req: + case <-time.After(timeout): + err = fmt.Errorf("send timed out") + return + } + + // Wait for a response + select { + case rpcResp = <-respCh: + if rpcResp.Error != nil { + err = rpcResp.Error + } + case <-time.After(timeout): + err = fmt.Errorf("command timed out") + } + return +} + +// EncodePeer implements the Transport interface. +func (i *InmemTransport) EncodePeer(id ServerID, p ServerAddress) []byte { + return []byte(p) +} + +// DecodePeer implements the Transport interface. +func (i *InmemTransport) DecodePeer(buf []byte) ServerAddress { + return ServerAddress(buf) +} + +// Connect is used to connect this transport to another transport for +// a given peer name. This allows for local routing. +func (i *InmemTransport) Connect(peer ServerAddress, t Transport) { + trans := t.(*InmemTransport) + i.Lock() + defer i.Unlock() + i.peers[peer] = trans +} + +// Disconnect is used to remove the ability to route to a given peer. +func (i *InmemTransport) Disconnect(peer ServerAddress) { + i.Lock() + defer i.Unlock() + delete(i.peers, peer) + + // Disconnect any pipelines + n := len(i.pipelines) + for idx := 0; idx < n; idx++ { + if i.pipelines[idx].peerAddr == peer { + i.pipelines[idx].Close() + i.pipelines[idx], i.pipelines[n-1] = i.pipelines[n-1], nil + idx-- + n-- + } + } + i.pipelines = i.pipelines[:n] +} + +// DisconnectAll is used to remove all routes to peers. +func (i *InmemTransport) DisconnectAll() { + i.Lock() + defer i.Unlock() + i.peers = make(map[ServerAddress]*InmemTransport) + + // Handle pipelines + for _, pipeline := range i.pipelines { + pipeline.Close() + } + i.pipelines = nil +} + +// Close is used to permanently disable the transport +func (i *InmemTransport) Close() error { + i.DisconnectAll() + return nil +} + +func newInmemPipeline(trans *InmemTransport, peer *InmemTransport, addr ServerAddress) *inmemPipeline { + i := &inmemPipeline{ + trans: trans, + peer: peer, + peerAddr: addr, + doneCh: make(chan AppendFuture, 16), + inprogressCh: make(chan *inmemPipelineInflight, 16), + shutdownCh: make(chan struct{}), + } + go i.decodeResponses() + return i +} + +func (i *inmemPipeline) decodeResponses() { + timeout := i.trans.timeout + for { + select { + case inp := <-i.inprogressCh: + var timeoutCh <-chan time.Time + if timeout > 0 { + timeoutCh = time.After(timeout) + } + + select { + case rpcResp := <-inp.respCh: + // Copy the result back + *inp.future.resp = *rpcResp.Response.(*AppendEntriesResponse) + inp.future.respond(rpcResp.Error) + + select { + case i.doneCh <- inp.future: + case <-i.shutdownCh: + return + } + + case <-timeoutCh: + inp.future.respond(fmt.Errorf("command timed out")) + select { + case i.doneCh <- inp.future: + case <-i.shutdownCh: + return + } + + case <-i.shutdownCh: + return + } + case <-i.shutdownCh: + return + } + } +} + +func (i *inmemPipeline) AppendEntries(args *AppendEntriesRequest, resp *AppendEntriesResponse) (AppendFuture, error) { + // Create a new future + future := &appendFuture{ + start: time.Now(), + args: args, + resp: resp, + } + future.init() + + // Handle a timeout + var timeout <-chan time.Time + if i.trans.timeout > 0 { + timeout = time.After(i.trans.timeout) + } + + // Send the RPC over + respCh := make(chan RPCResponse, 1) + rpc := RPC{ + Command: args, + RespChan: respCh, + } + select { + case i.peer.consumerCh <- rpc: + case <-timeout: + return nil, fmt.Errorf("command enqueue timeout") + case <-i.shutdownCh: + return nil, ErrPipelineShutdown + } + + // Send to be decoded + select { + case i.inprogressCh <- &inmemPipelineInflight{future, respCh}: + return future, nil + case <-i.shutdownCh: + return nil, ErrPipelineShutdown + } +} + +func (i *inmemPipeline) Consumer() <-chan AppendFuture { + return i.doneCh +} + +func (i *inmemPipeline) Close() error { + i.shutdownLock.Lock() + defer i.shutdownLock.Unlock() + if i.shutdown { + return nil + } + + i.shutdown = true + close(i.shutdownCh) + return nil +} diff --git a/vendor/github.com/hashicorp/raft/log.go b/vendor/github.com/hashicorp/raft/log.go new file mode 100644 index 0000000000..4ade38ecc1 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/log.go @@ -0,0 +1,72 @@ +package raft + +// LogType describes various types of log entries. +type LogType uint8 + +const ( + // LogCommand is applied to a user FSM. + LogCommand LogType = iota + + // LogNoop is used to assert leadership. + LogNoop + + // LogAddPeer is used to add a new peer. This should only be used with + // older protocol versions designed to be compatible with unversioned + // Raft servers. See comments in config.go for details. + LogAddPeerDeprecated + + // LogRemovePeer is used to remove an existing peer. This should only be + // used with older protocol versions designed to be compatible with + // unversioned Raft servers. See comments in config.go for details. + LogRemovePeerDeprecated + + // LogBarrier is used to ensure all preceding operations have been + // applied to the FSM. It is similar to LogNoop, but instead of returning + // once committed, it only returns once the FSM manager acks it. Otherwise + // it is possible there are operations committed but not yet applied to + // the FSM. + LogBarrier + + // LogConfiguration establishes a membership change configuration. It is + // created when a server is added, removed, promoted, etc. Only used + // when protocol version 1 or greater is in use. + LogConfiguration +) + +// Log entries are replicated to all members of the Raft cluster +// and form the heart of the replicated state machine. +type Log struct { + // Index holds the index of the log entry. + Index uint64 + + // Term holds the election term of the log entry. + Term uint64 + + // Type holds the type of the log entry. + Type LogType + + // Data holds the log entry's type-specific data. + Data []byte +} + +// LogStore is used to provide an interface for storing +// and retrieving logs in a durable fashion. +type LogStore interface { + // FirstIndex returns the first index written. 0 for no entries. + FirstIndex() (uint64, error) + + // LastIndex returns the last index written. 0 for no entries. + LastIndex() (uint64, error) + + // GetLog gets a log entry at a given index. + GetLog(index uint64, log *Log) error + + // StoreLog stores a log entry. + StoreLog(log *Log) error + + // StoreLogs stores multiple log entries. + StoreLogs(logs []*Log) error + + // DeleteRange deletes a range of log entries. The range is inclusive. + DeleteRange(min, max uint64) error +} diff --git a/vendor/github.com/hashicorp/raft/log_cache.go b/vendor/github.com/hashicorp/raft/log_cache.go new file mode 100644 index 0000000000..952e98c228 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/log_cache.go @@ -0,0 +1,79 @@ +package raft + +import ( + "fmt" + "sync" +) + +// LogCache wraps any LogStore implementation to provide an +// in-memory ring buffer. This is used to cache access to +// the recently written entries. For implementations that do not +// cache themselves, this can provide a substantial boost by +// avoiding disk I/O on recent entries. +type LogCache struct { + store LogStore + + cache []*Log + l sync.RWMutex +} + +// NewLogCache is used to create a new LogCache with the +// given capacity and backend store. +func NewLogCache(capacity int, store LogStore) (*LogCache, error) { + if capacity <= 0 { + return nil, fmt.Errorf("capacity must be positive") + } + c := &LogCache{ + store: store, + cache: make([]*Log, capacity), + } + return c, nil +} + +func (c *LogCache) GetLog(idx uint64, log *Log) error { + // Check the buffer for an entry + c.l.RLock() + cached := c.cache[idx%uint64(len(c.cache))] + c.l.RUnlock() + + // Check if entry is valid + if cached != nil && cached.Index == idx { + *log = *cached + return nil + } + + // Forward request on cache miss + return c.store.GetLog(idx, log) +} + +func (c *LogCache) StoreLog(log *Log) error { + return c.StoreLogs([]*Log{log}) +} + +func (c *LogCache) StoreLogs(logs []*Log) error { + // Insert the logs into the ring buffer + c.l.Lock() + for _, l := range logs { + c.cache[l.Index%uint64(len(c.cache))] = l + } + c.l.Unlock() + + return c.store.StoreLogs(logs) +} + +func (c *LogCache) FirstIndex() (uint64, error) { + return c.store.FirstIndex() +} + +func (c *LogCache) LastIndex() (uint64, error) { + return c.store.LastIndex() +} + +func (c *LogCache) DeleteRange(min, max uint64) error { + // Invalidate the cache on deletes + c.l.Lock() + c.cache = make([]*Log, len(c.cache)) + c.l.Unlock() + + return c.store.DeleteRange(min, max) +} diff --git a/vendor/github.com/hashicorp/raft/membership.md b/vendor/github.com/hashicorp/raft/membership.md new file mode 100644 index 0000000000..df1f83e27f --- /dev/null +++ b/vendor/github.com/hashicorp/raft/membership.md @@ -0,0 +1,83 @@ +Simon (@superfell) and I (@ongardie) talked through reworking this library's cluster membership changes last Friday. We don't see a way to split this into independent patches, so we're taking the next best approach: submitting the plan here for review, then working on an enormous PR. Your feedback would be appreciated. (@superfell is out this week, however, so don't expect him to respond quickly.) + +These are the main goals: + - Bringing things in line with the description in my PhD dissertation; + - Catching up new servers prior to granting them a vote, as well as allowing permanent non-voting members; and + - Eliminating the `peers.json` file, to avoid issues of consistency between that and the log/snapshot. + +## Data-centric view + +We propose to re-define a *configuration* as a set of servers, where each server includes an address (as it does today) and a mode that is either: + - *Voter*: a server whose vote is counted in elections and whose match index is used in advancing the leader's commit index. + - *Nonvoter*: a server that receives log entries but is not considered for elections or commitment purposes. + - *Staging*: a server that acts like a nonvoter with one exception: once a staging server receives enough log entries to catch up sufficiently to the leader's log, the leader will invoke a membership change to change the staging server to a voter. + +All changes to the configuration will be done by writing a new configuration to the log. The new configuration will be in affect as soon as it is appended to the log (not when it is committed like a normal state machine command). Note that, per my dissertation, there can be at most one uncommitted configuration at a time (the next configuration may not be created until the prior one has been committed). It's not strictly necessary to follow these same rules for the nonvoter/staging servers, but we think its best to treat all changes uniformly. + +Each server will track two configurations: + 1. its *committed configuration*: the latest configuration in the log/snapshot that has been committed, along with its index. + 2. its *latest configuration*: the latest configuration in the log/snapshot (may be committed or uncommitted), along with its index. + +When there's no membership change happening, these two will be the same. The latest configuration is almost always the one used, except: + - When followers truncate the suffix of their logs, they may need to fall back to the committed configuration. + - When snapshotting, the committed configuration is written, to correspond with the committed log prefix that is being snapshotted. + + +## Application API + +We propose the following operations for clients to manipulate the cluster configuration: + - AddVoter: server becomes staging unless voter, + - AddNonvoter: server becomes nonvoter unless staging or voter, + - DemoteVoter: server becomes nonvoter unless absent, + - RemovePeer: server removed from configuration, + - GetConfiguration: waits for latest config to commit, returns committed config. + +This diagram, of which I'm quite proud, shows the possible transitions: +``` ++-----------------------------------------------------------------------------+ +| | +| Start -> +--------+ | +| ,------<------------| | | +| / | absent | | +| / RemovePeer--> | | <---RemovePeer | +| / | +--------+ \ | +| / | | \ | +| AddNonvoter | AddVoter \ | +| | ,->---' `--<-. | \ | +| v / \ v \ | +| +----------+ +----------+ +----------+ | +| | | ---AddVoter--> | | -log caught up --> | | | +| | nonvoter | | staging | | voter | | +| | | <-DemoteVoter- | | ,- | | | +| +----------+ \ +----------+ / +----------+ | +| \ / | +| `--------------<---------------' | +| | ++-----------------------------------------------------------------------------+ +``` + +While these operations aren't quite symmetric, we think they're a good set to capture +the possible intent of the user. For example, if I want to make sure a server doesn't have a vote, but the server isn't part of the configuration at all, it probably shouldn't be added as a nonvoting server. + +Each of these application-level operations will be interpreted by the leader and, if it has an effect, will cause the leader to write a new configuration entry to its log. Which particular application-level operation caused the log entry to be written need not be part of the log entry. + +## Code implications + +This is a non-exhaustive list, but we came up with a few things: +- Remove the PeerStore: the `peers.json` file introduces the possibility of getting out of sync with the log and snapshot, and it's hard to maintain this atomically as the log changes. It's not clear whether it's meant to track the committed or latest configuration, either. +- Servers will have to search their snapshot and log to find the committed configuration and the latest configuration on startup. +- Bootstrap will no longer use `peers.json` but should initialize the log or snapshot with an application-provided configuration entry. +- Snapshots should store the index of their configuration along with the configuration itself. In my experience with LogCabin, the original log index of the configuration is very useful to include in debug log messages. +- As noted in hashicorp/raft#84, configuration change requests should come in via a separate channel, and one may not proceed until the last has been committed. +- As to deciding when a log is sufficiently caught up, implementing a sophisticated algorithm *is* something that can be done in a separate PR. An easy and decent placeholder is: once the staging server has reached 95% of the leader's commit index, promote it. + +## Feedback + +Again, we're looking for feedback here before we start working on this. Here are some questions to think about: + - Does this seem like where we want things to go? + - Is there anything here that should be left out? + - Is there anything else we're forgetting about? + - Is there a good way to break this up? + - What do we need to worry about in terms of backwards compatibility? + - What implication will this have on current tests? + - What's the best way to test this code, in particular the small changes that will be sprinkled all over the library? diff --git a/vendor/github.com/hashicorp/raft/net_transport.go b/vendor/github.com/hashicorp/raft/net_transport.go new file mode 100644 index 0000000000..4f1f101e00 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/net_transport.go @@ -0,0 +1,757 @@ +package raft + +import ( + "bufio" + "context" + "errors" + "fmt" + "io" + "log" + "net" + "os" + "sync" + "time" + + "github.com/hashicorp/go-msgpack/codec" +) + +const ( + rpcAppendEntries uint8 = iota + rpcRequestVote + rpcInstallSnapshot + + // DefaultTimeoutScale is the default TimeoutScale in a NetworkTransport. + DefaultTimeoutScale = 256 * 1024 // 256KB + + // rpcMaxPipeline controls the maximum number of outstanding + // AppendEntries RPC calls. + rpcMaxPipeline = 128 +) + +var ( + // ErrTransportShutdown is returned when operations on a transport are + // invoked after it's been terminated. + ErrTransportShutdown = errors.New("transport shutdown") + + // ErrPipelineShutdown is returned when the pipeline is closed. + ErrPipelineShutdown = errors.New("append pipeline closed") +) + +/* + +NetworkTransport provides a network based transport that can be +used to communicate with Raft on remote machines. It requires +an underlying stream layer to provide a stream abstraction, which can +be simple TCP, TLS, etc. + +This transport is very simple and lightweight. Each RPC request is +framed by sending a byte that indicates the message type, followed +by the MsgPack encoded request. + +The response is an error string followed by the response object, +both are encoded using MsgPack. + +InstallSnapshot is special, in that after the RPC request we stream +the entire state. That socket is not re-used as the connection state +is not known if there is an error. + +*/ +type NetworkTransport struct { + connPool map[ServerAddress][]*netConn + connPoolLock sync.Mutex + + consumeCh chan RPC + + heartbeatFn func(RPC) + heartbeatFnLock sync.Mutex + + logger *log.Logger + + maxPool int + + serverAddressProvider ServerAddressProvider + + shutdown bool + shutdownCh chan struct{} + shutdownLock sync.Mutex + + stream StreamLayer + + // streamCtx is used to cancel existing connection handlers. + streamCtx context.Context + streamCancel context.CancelFunc + streamCtxLock sync.RWMutex + + timeout time.Duration + TimeoutScale int +} + +// NetworkTransportConfig encapsulates configuration for the network transport layer. +type NetworkTransportConfig struct { + // ServerAddressProvider is used to override the target address when establishing a connection to invoke an RPC + ServerAddressProvider ServerAddressProvider + + Logger *log.Logger + + // Dialer + Stream StreamLayer + + // MaxPool controls how many connections we will pool + MaxPool int + + // Timeout is used to apply I/O deadlines. For InstallSnapshot, we multiply + // the timeout by (SnapshotSize / TimeoutScale). + Timeout time.Duration +} + +type ServerAddressProvider interface { + ServerAddr(id ServerID) (ServerAddress, error) +} + +// StreamLayer is used with the NetworkTransport to provide +// the low level stream abstraction. +type StreamLayer interface { + net.Listener + + // Dial is used to create a new outgoing connection + Dial(address ServerAddress, timeout time.Duration) (net.Conn, error) +} + +type netConn struct { + target ServerAddress + conn net.Conn + r *bufio.Reader + w *bufio.Writer + dec *codec.Decoder + enc *codec.Encoder +} + +func (n *netConn) Release() error { + return n.conn.Close() +} + +type netPipeline struct { + conn *netConn + trans *NetworkTransport + + doneCh chan AppendFuture + inprogressCh chan *appendFuture + + shutdown bool + shutdownCh chan struct{} + shutdownLock sync.Mutex +} + +// NewNetworkTransportWithConfig creates a new network transport with the given config struct +func NewNetworkTransportWithConfig( + config *NetworkTransportConfig, +) *NetworkTransport { + if config.Logger == nil { + config.Logger = log.New(os.Stderr, "", log.LstdFlags) + } + trans := &NetworkTransport{ + connPool: make(map[ServerAddress][]*netConn), + consumeCh: make(chan RPC), + logger: config.Logger, + maxPool: config.MaxPool, + shutdownCh: make(chan struct{}), + stream: config.Stream, + timeout: config.Timeout, + TimeoutScale: DefaultTimeoutScale, + serverAddressProvider: config.ServerAddressProvider, + } + + // Create the connection context and then start our listener. + trans.setupStreamContext() + go trans.listen() + + return trans +} + +// NewNetworkTransport creates a new network transport with the given dialer +// and listener. The maxPool controls how many connections we will pool. The +// timeout is used to apply I/O deadlines. For InstallSnapshot, we multiply +// the timeout by (SnapshotSize / TimeoutScale). +func NewNetworkTransport( + stream StreamLayer, + maxPool int, + timeout time.Duration, + logOutput io.Writer, +) *NetworkTransport { + if logOutput == nil { + logOutput = os.Stderr + } + logger := log.New(logOutput, "", log.LstdFlags) + config := &NetworkTransportConfig{Stream: stream, MaxPool: maxPool, Timeout: timeout, Logger: logger} + return NewNetworkTransportWithConfig(config) +} + +// NewNetworkTransportWithLogger creates a new network transport with the given logger, dialer +// and listener. The maxPool controls how many connections we will pool. The +// timeout is used to apply I/O deadlines. For InstallSnapshot, we multiply +// the timeout by (SnapshotSize / TimeoutScale). +func NewNetworkTransportWithLogger( + stream StreamLayer, + maxPool int, + timeout time.Duration, + logger *log.Logger, +) *NetworkTransport { + config := &NetworkTransportConfig{Stream: stream, MaxPool: maxPool, Timeout: timeout, Logger: logger} + return NewNetworkTransportWithConfig(config) +} + +// setupStreamContext is used to create a new stream context. This should be +// called with the stream lock held. +func (n *NetworkTransport) setupStreamContext() { + ctx, cancel := context.WithCancel(context.Background()) + n.streamCtx = ctx + n.streamCancel = cancel +} + +// getStreamContext is used retrieve the current stream context. +func (n *NetworkTransport) getStreamContext() context.Context { + n.streamCtxLock.RLock() + defer n.streamCtxLock.RUnlock() + return n.streamCtx +} + +// SetHeartbeatHandler is used to setup a heartbeat handler +// as a fast-pass. This is to avoid head-of-line blocking from +// disk IO. +func (n *NetworkTransport) SetHeartbeatHandler(cb func(rpc RPC)) { + n.heartbeatFnLock.Lock() + defer n.heartbeatFnLock.Unlock() + n.heartbeatFn = cb +} + +// CloseStreams closes the current streams. +func (n *NetworkTransport) CloseStreams() { + n.connPoolLock.Lock() + defer n.connPoolLock.Unlock() + + // Close all the connections in the connection pool and then remove their + // entry. + for k, e := range n.connPool { + for _, conn := range e { + conn.Release() + } + + delete(n.connPool, k) + } + + // Cancel the existing connections and create a new context. Both these + // operations must always be done with the lock held otherwise we can create + // connection handlers that are holding a context that will never be + // cancelable. + n.streamCtxLock.Lock() + n.streamCancel() + n.setupStreamContext() + n.streamCtxLock.Unlock() +} + +// Close is used to stop the network transport. +func (n *NetworkTransport) Close() error { + n.shutdownLock.Lock() + defer n.shutdownLock.Unlock() + + if !n.shutdown { + close(n.shutdownCh) + n.stream.Close() + n.shutdown = true + } + return nil +} + +// Consumer implements the Transport interface. +func (n *NetworkTransport) Consumer() <-chan RPC { + return n.consumeCh +} + +// LocalAddr implements the Transport interface. +func (n *NetworkTransport) LocalAddr() ServerAddress { + return ServerAddress(n.stream.Addr().String()) +} + +// IsShutdown is used to check if the transport is shutdown. +func (n *NetworkTransport) IsShutdown() bool { + select { + case <-n.shutdownCh: + return true + default: + return false + } +} + +// getExistingConn is used to grab a pooled connection. +func (n *NetworkTransport) getPooledConn(target ServerAddress) *netConn { + n.connPoolLock.Lock() + defer n.connPoolLock.Unlock() + + conns, ok := n.connPool[target] + if !ok || len(conns) == 0 { + return nil + } + + var conn *netConn + num := len(conns) + conn, conns[num-1] = conns[num-1], nil + n.connPool[target] = conns[:num-1] + return conn +} + +// getConnFromAddressProvider returns a connection from the server address provider if available, or defaults to a connection using the target server address +func (n *NetworkTransport) getConnFromAddressProvider(id ServerID, target ServerAddress) (*netConn, error) { + address := n.getProviderAddressOrFallback(id, target) + return n.getConn(address) +} + +func (n *NetworkTransport) getProviderAddressOrFallback(id ServerID, target ServerAddress) ServerAddress { + if n.serverAddressProvider != nil { + serverAddressOverride, err := n.serverAddressProvider.ServerAddr(id) + if err != nil { + n.logger.Printf("[WARN] raft: Unable to get address for server id %v, using fallback address %v: %v", id, target, err) + } else { + return serverAddressOverride + } + } + return target +} + +// getConn is used to get a connection from the pool. +func (n *NetworkTransport) getConn(target ServerAddress) (*netConn, error) { + // Check for a pooled conn + if conn := n.getPooledConn(target); conn != nil { + return conn, nil + } + + // Dial a new connection + conn, err := n.stream.Dial(target, n.timeout) + if err != nil { + return nil, err + } + + // Wrap the conn + netConn := &netConn{ + target: target, + conn: conn, + r: bufio.NewReader(conn), + w: bufio.NewWriter(conn), + } + + // Setup encoder/decoders + netConn.dec = codec.NewDecoder(netConn.r, &codec.MsgpackHandle{}) + netConn.enc = codec.NewEncoder(netConn.w, &codec.MsgpackHandle{}) + + // Done + return netConn, nil +} + +// returnConn returns a connection back to the pool. +func (n *NetworkTransport) returnConn(conn *netConn) { + n.connPoolLock.Lock() + defer n.connPoolLock.Unlock() + + key := conn.target + conns, _ := n.connPool[key] + + if !n.IsShutdown() && len(conns) < n.maxPool { + n.connPool[key] = append(conns, conn) + } else { + conn.Release() + } +} + +// AppendEntriesPipeline returns an interface that can be used to pipeline +// AppendEntries requests. +func (n *NetworkTransport) AppendEntriesPipeline(id ServerID, target ServerAddress) (AppendPipeline, error) { + // Get a connection + conn, err := n.getConnFromAddressProvider(id, target) + if err != nil { + return nil, err + } + + // Create the pipeline + return newNetPipeline(n, conn), nil +} + +// AppendEntries implements the Transport interface. +func (n *NetworkTransport) AppendEntries(id ServerID, target ServerAddress, args *AppendEntriesRequest, resp *AppendEntriesResponse) error { + return n.genericRPC(id, target, rpcAppendEntries, args, resp) +} + +// RequestVote implements the Transport interface. +func (n *NetworkTransport) RequestVote(id ServerID, target ServerAddress, args *RequestVoteRequest, resp *RequestVoteResponse) error { + return n.genericRPC(id, target, rpcRequestVote, args, resp) +} + +// genericRPC handles a simple request/response RPC. +func (n *NetworkTransport) genericRPC(id ServerID, target ServerAddress, rpcType uint8, args interface{}, resp interface{}) error { + // Get a conn + conn, err := n.getConnFromAddressProvider(id, target) + if err != nil { + return err + } + + // Set a deadline + if n.timeout > 0 { + conn.conn.SetDeadline(time.Now().Add(n.timeout)) + } + + // Send the RPC + if err = sendRPC(conn, rpcType, args); err != nil { + return err + } + + // Decode the response + canReturn, err := decodeResponse(conn, resp) + if canReturn { + n.returnConn(conn) + } + return err +} + +// InstallSnapshot implements the Transport interface. +func (n *NetworkTransport) InstallSnapshot(id ServerID, target ServerAddress, args *InstallSnapshotRequest, resp *InstallSnapshotResponse, data io.Reader) error { + // Get a conn, always close for InstallSnapshot + conn, err := n.getConnFromAddressProvider(id, target) + if err != nil { + return err + } + defer conn.Release() + + // Set a deadline, scaled by request size + if n.timeout > 0 { + timeout := n.timeout * time.Duration(args.Size/int64(n.TimeoutScale)) + if timeout < n.timeout { + timeout = n.timeout + } + conn.conn.SetDeadline(time.Now().Add(timeout)) + } + + // Send the RPC + if err = sendRPC(conn, rpcInstallSnapshot, args); err != nil { + return err + } + + // Stream the state + if _, err = io.Copy(conn.w, data); err != nil { + return err + } + + // Flush + if err = conn.w.Flush(); err != nil { + return err + } + + // Decode the response, do not return conn + _, err = decodeResponse(conn, resp) + return err +} + +// EncodePeer implements the Transport interface. +func (n *NetworkTransport) EncodePeer(id ServerID, p ServerAddress) []byte { + address := n.getProviderAddressOrFallback(id, p) + return []byte(address) +} + +// DecodePeer implements the Transport interface. +func (n *NetworkTransport) DecodePeer(buf []byte) ServerAddress { + return ServerAddress(buf) +} + +// listen is used to handling incoming connections. +func (n *NetworkTransport) listen() { + const baseDelay = 5 * time.Millisecond + const maxDelay = 1 * time.Second + + var loopDelay time.Duration + for { + // Accept incoming connections + conn, err := n.stream.Accept() + if err != nil { + if loopDelay == 0 { + loopDelay = baseDelay + } else { + loopDelay *= 2 + } + + if loopDelay > maxDelay { + loopDelay = maxDelay + } + + if !n.IsShutdown() { + n.logger.Printf("[ERR] raft-net: Failed to accept connection: %v", err) + } + + select { + case <-n.shutdownCh: + return + case <-time.After(loopDelay): + continue + } + } + // No error, reset loop delay + loopDelay = 0 + + n.logger.Printf("[DEBUG] raft-net: %v accepted connection from: %v", n.LocalAddr(), conn.RemoteAddr()) + + // Handle the connection in dedicated routine + go n.handleConn(n.getStreamContext(), conn) + } +} + +// handleConn is used to handle an inbound connection for its lifespan. The +// handler will exit when the passed context is cancelled or the connection is +// closed. +func (n *NetworkTransport) handleConn(connCtx context.Context, conn net.Conn) { + defer conn.Close() + r := bufio.NewReader(conn) + w := bufio.NewWriter(conn) + dec := codec.NewDecoder(r, &codec.MsgpackHandle{}) + enc := codec.NewEncoder(w, &codec.MsgpackHandle{}) + + for { + select { + case <-connCtx.Done(): + n.logger.Println("[DEBUG] raft-net: stream layer is closed") + return + default: + } + + if err := n.handleCommand(r, dec, enc); err != nil { + if err != io.EOF { + n.logger.Printf("[ERR] raft-net: Failed to decode incoming command: %v", err) + } + return + } + if err := w.Flush(); err != nil { + n.logger.Printf("[ERR] raft-net: Failed to flush response: %v", err) + return + } + } +} + +// handleCommand is used to decode and dispatch a single command. +func (n *NetworkTransport) handleCommand(r *bufio.Reader, dec *codec.Decoder, enc *codec.Encoder) error { + // Get the rpc type + rpcType, err := r.ReadByte() + if err != nil { + return err + } + + // Create the RPC object + respCh := make(chan RPCResponse, 1) + rpc := RPC{ + RespChan: respCh, + } + + // Decode the command + isHeartbeat := false + switch rpcType { + case rpcAppendEntries: + var req AppendEntriesRequest + if err := dec.Decode(&req); err != nil { + return err + } + rpc.Command = &req + + // Check if this is a heartbeat + if req.Term != 0 && req.Leader != nil && + req.PrevLogEntry == 0 && req.PrevLogTerm == 0 && + len(req.Entries) == 0 && req.LeaderCommitIndex == 0 { + isHeartbeat = true + } + + case rpcRequestVote: + var req RequestVoteRequest + if err := dec.Decode(&req); err != nil { + return err + } + rpc.Command = &req + + case rpcInstallSnapshot: + var req InstallSnapshotRequest + if err := dec.Decode(&req); err != nil { + return err + } + rpc.Command = &req + rpc.Reader = io.LimitReader(r, req.Size) + + default: + return fmt.Errorf("unknown rpc type %d", rpcType) + } + + // Check for heartbeat fast-path + if isHeartbeat { + n.heartbeatFnLock.Lock() + fn := n.heartbeatFn + n.heartbeatFnLock.Unlock() + if fn != nil { + fn(rpc) + goto RESP + } + } + + // Dispatch the RPC + select { + case n.consumeCh <- rpc: + case <-n.shutdownCh: + return ErrTransportShutdown + } + + // Wait for response +RESP: + select { + case resp := <-respCh: + // Send the error first + respErr := "" + if resp.Error != nil { + respErr = resp.Error.Error() + } + if err := enc.Encode(respErr); err != nil { + return err + } + + // Send the response + if err := enc.Encode(resp.Response); err != nil { + return err + } + case <-n.shutdownCh: + return ErrTransportShutdown + } + return nil +} + +// decodeResponse is used to decode an RPC response and reports whether +// the connection can be reused. +func decodeResponse(conn *netConn, resp interface{}) (bool, error) { + // Decode the error if any + var rpcError string + if err := conn.dec.Decode(&rpcError); err != nil { + conn.Release() + return false, err + } + + // Decode the response + if err := conn.dec.Decode(resp); err != nil { + conn.Release() + return false, err + } + + // Format an error if any + if rpcError != "" { + return true, fmt.Errorf(rpcError) + } + return true, nil +} + +// sendRPC is used to encode and send the RPC. +func sendRPC(conn *netConn, rpcType uint8, args interface{}) error { + // Write the request type + if err := conn.w.WriteByte(rpcType); err != nil { + conn.Release() + return err + } + + // Send the request + if err := conn.enc.Encode(args); err != nil { + conn.Release() + return err + } + + // Flush + if err := conn.w.Flush(); err != nil { + conn.Release() + return err + } + return nil +} + +// newNetPipeline is used to construct a netPipeline from a given +// transport and connection. +func newNetPipeline(trans *NetworkTransport, conn *netConn) *netPipeline { + n := &netPipeline{ + conn: conn, + trans: trans, + doneCh: make(chan AppendFuture, rpcMaxPipeline), + inprogressCh: make(chan *appendFuture, rpcMaxPipeline), + shutdownCh: make(chan struct{}), + } + go n.decodeResponses() + return n +} + +// decodeResponses is a long running routine that decodes the responses +// sent on the connection. +func (n *netPipeline) decodeResponses() { + timeout := n.trans.timeout + for { + select { + case future := <-n.inprogressCh: + if timeout > 0 { + n.conn.conn.SetReadDeadline(time.Now().Add(timeout)) + } + + _, err := decodeResponse(n.conn, future.resp) + future.respond(err) + select { + case n.doneCh <- future: + case <-n.shutdownCh: + return + } + case <-n.shutdownCh: + return + } + } +} + +// AppendEntries is used to pipeline a new append entries request. +func (n *netPipeline) AppendEntries(args *AppendEntriesRequest, resp *AppendEntriesResponse) (AppendFuture, error) { + // Create a new future + future := &appendFuture{ + start: time.Now(), + args: args, + resp: resp, + } + future.init() + + // Add a send timeout + if timeout := n.trans.timeout; timeout > 0 { + n.conn.conn.SetWriteDeadline(time.Now().Add(timeout)) + } + + // Send the RPC + if err := sendRPC(n.conn, rpcAppendEntries, future.args); err != nil { + return nil, err + } + + // Hand-off for decoding, this can also cause back-pressure + // to prevent too many inflight requests + select { + case n.inprogressCh <- future: + return future, nil + case <-n.shutdownCh: + return nil, ErrPipelineShutdown + } +} + +// Consumer returns a channel that can be used to consume complete futures. +func (n *netPipeline) Consumer() <-chan AppendFuture { + return n.doneCh +} + +// Closed is used to shutdown the pipeline connection. +func (n *netPipeline) Close() error { + n.shutdownLock.Lock() + defer n.shutdownLock.Unlock() + if n.shutdown { + return nil + } + + // Release the connection + n.conn.Release() + + n.shutdown = true + close(n.shutdownCh) + return nil +} diff --git a/vendor/github.com/hashicorp/raft/observer.go b/vendor/github.com/hashicorp/raft/observer.go new file mode 100644 index 0000000000..2d4f37db12 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/observer.go @@ -0,0 +1,131 @@ +package raft + +import ( + "sync/atomic" +) + +// Observation is sent along the given channel to observers when an event occurs. +type Observation struct { + // Raft holds the Raft instance generating the observation. + Raft *Raft + // Data holds observation-specific data. Possible types are + // *RequestVoteRequest + // RaftState + // PeerObservation + // LeaderObservation + Data interface{} +} + +// LeaderObservation is used for the data when leadership changes. +type LeaderObservation struct { + leader ServerAddress +} + +// PeerObservation is sent to observers when peers change. +type PeerObservation struct { + Removed bool + Peer Server +} + +// nextObserverId is used to provide a unique ID for each observer to aid in +// deregistration. +var nextObserverID uint64 + +// FilterFn is a function that can be registered in order to filter observations. +// The function reports whether the observation should be included - if +// it returns false, the observation will be filtered out. +type FilterFn func(o *Observation) bool + +// Observer describes what to do with a given observation. +type Observer struct { + // numObserved and numDropped are performance counters for this observer. + // 64 bit types must be 64 bit aligned to use with atomic operations on + // 32 bit platforms, so keep them at the top of the struct. + numObserved uint64 + numDropped uint64 + + // channel receives observations. + channel chan Observation + + // blocking, if true, will cause Raft to block when sending an observation + // to this observer. This should generally be set to false. + blocking bool + + // filter will be called to determine if an observation should be sent to + // the channel. + filter FilterFn + + // id is the ID of this observer in the Raft map. + id uint64 +} + +// NewObserver creates a new observer that can be registered +// to make observations on a Raft instance. Observations +// will be sent on the given channel if they satisfy the +// given filter. +// +// If blocking is true, the observer will block when it can't +// send on the channel, otherwise it may discard events. +func NewObserver(channel chan Observation, blocking bool, filter FilterFn) *Observer { + return &Observer{ + channel: channel, + blocking: blocking, + filter: filter, + id: atomic.AddUint64(&nextObserverID, 1), + } +} + +// GetNumObserved returns the number of observations. +func (or *Observer) GetNumObserved() uint64 { + return atomic.LoadUint64(&or.numObserved) +} + +// GetNumDropped returns the number of dropped observations due to blocking. +func (or *Observer) GetNumDropped() uint64 { + return atomic.LoadUint64(&or.numDropped) +} + +// RegisterObserver registers a new observer. +func (r *Raft) RegisterObserver(or *Observer) { + r.observersLock.Lock() + defer r.observersLock.Unlock() + r.observers[or.id] = or +} + +// DeregisterObserver deregisters an observer. +func (r *Raft) DeregisterObserver(or *Observer) { + r.observersLock.Lock() + defer r.observersLock.Unlock() + delete(r.observers, or.id) +} + +// observe sends an observation to every observer. +func (r *Raft) observe(o interface{}) { + // In general observers should not block. But in any case this isn't + // disastrous as we only hold a read lock, which merely prevents + // registration / deregistration of observers. + r.observersLock.RLock() + defer r.observersLock.RUnlock() + for _, or := range r.observers { + // It's wasteful to do this in the loop, but for the common case + // where there are no observers we won't create any objects. + ob := Observation{Raft: r, Data: o} + if or.filter != nil && !or.filter(&ob) { + continue + } + if or.channel == nil { + continue + } + if or.blocking { + or.channel <- ob + atomic.AddUint64(&or.numObserved, 1) + } else { + select { + case or.channel <- ob: + atomic.AddUint64(&or.numObserved, 1) + default: + atomic.AddUint64(&or.numDropped, 1) + } + } + } +} diff --git a/vendor/github.com/hashicorp/raft/peersjson.go b/vendor/github.com/hashicorp/raft/peersjson.go new file mode 100644 index 0000000000..38ca2a8b84 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/peersjson.go @@ -0,0 +1,98 @@ +package raft + +import ( + "bytes" + "encoding/json" + "io/ioutil" +) + +// ReadPeersJSON consumes a legacy peers.json file in the format of the old JSON +// peer store and creates a new-style configuration structure. This can be used +// to migrate this data or perform manual recovery when running protocol versions +// that can interoperate with older, unversioned Raft servers. This should not be +// used once server IDs are in use, because the old peers.json file didn't have +// support for these, nor non-voter suffrage types. +func ReadPeersJSON(path string) (Configuration, error) { + // Read in the file. + buf, err := ioutil.ReadFile(path) + if err != nil { + return Configuration{}, err + } + + // Parse it as JSON. + var peers []string + dec := json.NewDecoder(bytes.NewReader(buf)) + if err := dec.Decode(&peers); err != nil { + return Configuration{}, err + } + + // Map it into the new-style configuration structure. We can only specify + // voter roles here, and the ID has to be the same as the address. + var configuration Configuration + for _, peer := range peers { + server := Server{ + Suffrage: Voter, + ID: ServerID(peer), + Address: ServerAddress(peer), + } + configuration.Servers = append(configuration.Servers, server) + } + + // We should only ingest valid configurations. + if err := checkConfiguration(configuration); err != nil { + return Configuration{}, err + } + return configuration, nil +} + +// configEntry is used when decoding a new-style peers.json. +type configEntry struct { + // ID is the ID of the server (a UUID, usually). + ID ServerID `json:"id"` + + // Address is the host:port of the server. + Address ServerAddress `json:"address"` + + // NonVoter controls the suffrage. We choose this sense so people + // can leave this out and get a Voter by default. + NonVoter bool `json:"non_voter"` +} + +// ReadConfigJSON reads a new-style peers.json and returns a configuration +// structure. This can be used to perform manual recovery when running protocol +// versions that use server IDs. +func ReadConfigJSON(path string) (Configuration, error) { + // Read in the file. + buf, err := ioutil.ReadFile(path) + if err != nil { + return Configuration{}, err + } + + // Parse it as JSON. + var peers []configEntry + dec := json.NewDecoder(bytes.NewReader(buf)) + if err := dec.Decode(&peers); err != nil { + return Configuration{}, err + } + + // Map it into the new-style configuration structure. + var configuration Configuration + for _, peer := range peers { + suffrage := Voter + if peer.NonVoter { + suffrage = Nonvoter + } + server := Server{ + Suffrage: suffrage, + ID: peer.ID, + Address: peer.Address, + } + configuration.Servers = append(configuration.Servers, server) + } + + // We should only ingest valid configurations. + if err := checkConfiguration(configuration); err != nil { + return Configuration{}, err + } + return configuration, nil +} diff --git a/vendor/github.com/hashicorp/raft/raft.go b/vendor/github.com/hashicorp/raft/raft.go new file mode 100644 index 0000000000..a759230bc9 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/raft.go @@ -0,0 +1,1486 @@ +package raft + +import ( + "bytes" + "container/list" + "fmt" + "io" + "io/ioutil" + "time" + + "github.com/armon/go-metrics" +) + +const ( + minCheckInterval = 10 * time.Millisecond +) + +var ( + keyCurrentTerm = []byte("CurrentTerm") + keyLastVoteTerm = []byte("LastVoteTerm") + keyLastVoteCand = []byte("LastVoteCand") +) + +// getRPCHeader returns an initialized RPCHeader struct for the given +// Raft instance. This structure is sent along with RPC requests and +// responses. +func (r *Raft) getRPCHeader() RPCHeader { + return RPCHeader{ + ProtocolVersion: r.conf.ProtocolVersion, + } +} + +// checkRPCHeader houses logic about whether this instance of Raft can process +// the given RPC message. +func (r *Raft) checkRPCHeader(rpc RPC) error { + // Get the header off the RPC message. + wh, ok := rpc.Command.(WithRPCHeader) + if !ok { + return fmt.Errorf("RPC does not have a header") + } + header := wh.GetRPCHeader() + + // First check is to just make sure the code can understand the + // protocol at all. + if header.ProtocolVersion < ProtocolVersionMin || + header.ProtocolVersion > ProtocolVersionMax { + return ErrUnsupportedProtocol + } + + // Second check is whether we should support this message, given the + // current protocol we are configured to run. This will drop support + // for protocol version 0 starting at protocol version 2, which is + // currently what we want, and in general support one version back. We + // may need to revisit this policy depending on how future protocol + // changes evolve. + if header.ProtocolVersion < r.conf.ProtocolVersion-1 { + return ErrUnsupportedProtocol + } + + return nil +} + +// getSnapshotVersion returns the snapshot version that should be used when +// creating snapshots, given the protocol version in use. +func getSnapshotVersion(protocolVersion ProtocolVersion) SnapshotVersion { + // Right now we only have two versions and they are backwards compatible + // so we don't need to look at the protocol version. + return 1 +} + +// commitTuple is used to send an index that was committed, +// with an optional associated future that should be invoked. +type commitTuple struct { + log *Log + future *logFuture +} + +// leaderState is state that is used while we are a leader. +type leaderState struct { + commitCh chan struct{} + commitment *commitment + inflight *list.List // list of logFuture in log index order + replState map[ServerID]*followerReplication + notify map[*verifyFuture]struct{} + stepDown chan struct{} +} + +// setLeader is used to modify the current leader of the cluster +func (r *Raft) setLeader(leader ServerAddress) { + r.leaderLock.Lock() + oldLeader := r.leader + r.leader = leader + r.leaderLock.Unlock() + if oldLeader != leader { + r.observe(LeaderObservation{leader: leader}) + } +} + +// requestConfigChange is a helper for the above functions that make +// configuration change requests. 'req' describes the change. For timeout, +// see AddVoter. +func (r *Raft) requestConfigChange(req configurationChangeRequest, timeout time.Duration) IndexFuture { + var timer <-chan time.Time + if timeout > 0 { + timer = time.After(timeout) + } + future := &configurationChangeFuture{ + req: req, + } + future.init() + select { + case <-timer: + return errorFuture{ErrEnqueueTimeout} + case r.configurationChangeCh <- future: + return future + case <-r.shutdownCh: + return errorFuture{ErrRaftShutdown} + } +} + +// run is a long running goroutine that runs the Raft FSM. +func (r *Raft) run() { + for { + // Check if we are doing a shutdown + select { + case <-r.shutdownCh: + // Clear the leader to prevent forwarding + r.setLeader("") + return + default: + } + + // Enter into a sub-FSM + switch r.getState() { + case Follower: + r.runFollower() + case Candidate: + r.runCandidate() + case Leader: + r.runLeader() + } + } +} + +// runFollower runs the FSM for a follower. +func (r *Raft) runFollower() { + didWarn := false + r.logger.Info(fmt.Sprintf("%v entering Follower state (Leader: %q)", r, r.Leader())) + metrics.IncrCounter([]string{"raft", "state", "follower"}, 1) + heartbeatTimer := randomTimeout(r.conf.HeartbeatTimeout) + for { + select { + case rpc := <-r.rpcCh: + r.processRPC(rpc) + + case c := <-r.configurationChangeCh: + // Reject any operations since we are not the leader + c.respond(ErrNotLeader) + + case a := <-r.applyCh: + // Reject any operations since we are not the leader + a.respond(ErrNotLeader) + + case v := <-r.verifyCh: + // Reject any operations since we are not the leader + v.respond(ErrNotLeader) + + case r := <-r.userRestoreCh: + // Reject any restores since we are not the leader + r.respond(ErrNotLeader) + + case c := <-r.configurationsCh: + c.configurations = r.configurations.Clone() + c.respond(nil) + + case b := <-r.bootstrapCh: + b.respond(r.liveBootstrap(b.configuration)) + + case <-heartbeatTimer: + // Restart the heartbeat timer + heartbeatTimer = randomTimeout(r.conf.HeartbeatTimeout) + + // Check if we have had a successful contact + lastContact := r.LastContact() + if time.Now().Sub(lastContact) < r.conf.HeartbeatTimeout { + continue + } + + // Heartbeat failed! Transition to the candidate state + lastLeader := r.Leader() + r.setLeader("") + + if r.configurations.latestIndex == 0 { + if !didWarn { + r.logger.Warn("no known peers, aborting election") + didWarn = true + } + } else if r.configurations.latestIndex == r.configurations.committedIndex && + !hasVote(r.configurations.latest, r.localID) { + if !didWarn { + r.logger.Warn("not part of stable configuration, aborting election") + didWarn = true + } + } else { + r.logger.Warn(fmt.Sprintf("Heartbeat timeout from %q reached, starting election", lastLeader)) + metrics.IncrCounter([]string{"raft", "transition", "heartbeat_timeout"}, 1) + r.setState(Candidate) + return + } + + case <-r.shutdownCh: + return + } + } +} + +// liveBootstrap attempts to seed an initial configuration for the cluster. See +// the Raft object's member BootstrapCluster for more details. This must only be +// called on the main thread, and only makes sense in the follower state. +func (r *Raft) liveBootstrap(configuration Configuration) error { + // Use the pre-init API to make the static updates. + err := BootstrapCluster(&r.conf, r.logs, r.stable, r.snapshots, + r.trans, configuration) + if err != nil { + return err + } + + // Make the configuration live. + var entry Log + if err := r.logs.GetLog(1, &entry); err != nil { + panic(err) + } + r.setCurrentTerm(1) + r.setLastLog(entry.Index, entry.Term) + r.processConfigurationLogEntry(&entry) + return nil +} + +// runCandidate runs the FSM for a candidate. +func (r *Raft) runCandidate() { + r.logger.Info(fmt.Sprintf("%v entering Candidate state in term %v", r, r.getCurrentTerm()+1)) + metrics.IncrCounter([]string{"raft", "state", "candidate"}, 1) + + // Start vote for us, and set a timeout + voteCh := r.electSelf() + electionTimer := randomTimeout(r.conf.ElectionTimeout) + + // Tally the votes, need a simple majority + grantedVotes := 0 + votesNeeded := r.quorumSize() + r.logger.Debug(fmt.Sprintf("Votes needed: %d", votesNeeded)) + + for r.getState() == Candidate { + select { + case rpc := <-r.rpcCh: + r.processRPC(rpc) + + case vote := <-voteCh: + // Check if the term is greater than ours, bail + if vote.Term > r.getCurrentTerm() { + r.logger.Debug("Newer term discovered, fallback to follower") + r.setState(Follower) + r.setCurrentTerm(vote.Term) + return + } + + // Check if the vote is granted + if vote.Granted { + grantedVotes++ + r.logger.Debug(fmt.Sprintf("Vote granted from %s in term %v. Tally: %d", + vote.voterID, vote.Term, grantedVotes)) + } + + // Check if we've become the leader + if grantedVotes >= votesNeeded { + r.logger.Info(fmt.Sprintf("Election won. Tally: %d", grantedVotes)) + r.setState(Leader) + r.setLeader(r.localAddr) + return + } + + case c := <-r.configurationChangeCh: + // Reject any operations since we are not the leader + c.respond(ErrNotLeader) + + case a := <-r.applyCh: + // Reject any operations since we are not the leader + a.respond(ErrNotLeader) + + case v := <-r.verifyCh: + // Reject any operations since we are not the leader + v.respond(ErrNotLeader) + + case r := <-r.userRestoreCh: + // Reject any restores since we are not the leader + r.respond(ErrNotLeader) + + case c := <-r.configurationsCh: + c.configurations = r.configurations.Clone() + c.respond(nil) + + case b := <-r.bootstrapCh: + b.respond(ErrCantBootstrap) + + case <-electionTimer: + // Election failed! Restart the election. We simply return, + // which will kick us back into runCandidate + r.logger.Warn("Election timeout reached, restarting election") + return + + case <-r.shutdownCh: + return + } + } +} + +// runLeader runs the FSM for a leader. Do the setup here and drop into +// the leaderLoop for the hot loop. +func (r *Raft) runLeader() { + r.logger.Info(fmt.Sprintf("%v entering Leader state", r)) + metrics.IncrCounter([]string{"raft", "state", "leader"}, 1) + + // Notify that we are the leader + asyncNotifyBool(r.leaderCh, true) + + // Push to the notify channel if given + if notify := r.conf.NotifyCh; notify != nil { + select { + case notify <- true: + case <-r.shutdownCh: + } + } + + // Setup leader state + r.leaderState.commitCh = make(chan struct{}, 1) + r.leaderState.commitment = newCommitment(r.leaderState.commitCh, + r.configurations.latest, + r.getLastIndex()+1 /* first index that may be committed in this term */) + r.leaderState.inflight = list.New() + r.leaderState.replState = make(map[ServerID]*followerReplication) + r.leaderState.notify = make(map[*verifyFuture]struct{}) + r.leaderState.stepDown = make(chan struct{}, 1) + + // Cleanup state on step down + defer func() { + // Since we were the leader previously, we update our + // last contact time when we step down, so that we are not + // reporting a last contact time from before we were the + // leader. Otherwise, to a client it would seem our data + // is extremely stale. + r.setLastContact() + + // Stop replication + for _, p := range r.leaderState.replState { + close(p.stopCh) + } + + // Respond to all inflight operations + for e := r.leaderState.inflight.Front(); e != nil; e = e.Next() { + e.Value.(*logFuture).respond(ErrLeadershipLost) + } + + // Respond to any pending verify requests + for future := range r.leaderState.notify { + future.respond(ErrLeadershipLost) + } + + // Clear all the state + r.leaderState.commitCh = nil + r.leaderState.commitment = nil + r.leaderState.inflight = nil + r.leaderState.replState = nil + r.leaderState.notify = nil + r.leaderState.stepDown = nil + + // If we are stepping down for some reason, no known leader. + // We may have stepped down due to an RPC call, which would + // provide the leader, so we cannot always blank this out. + r.leaderLock.Lock() + if r.leader == r.localAddr { + r.leader = "" + } + r.leaderLock.Unlock() + + // Notify that we are not the leader + asyncNotifyBool(r.leaderCh, false) + + // Push to the notify channel if given + if notify := r.conf.NotifyCh; notify != nil { + select { + case notify <- false: + case <-r.shutdownCh: + // On shutdown, make a best effort but do not block + select { + case notify <- false: + default: + } + } + } + }() + + // Start a replication routine for each peer + r.startStopReplication() + + // Dispatch a no-op log entry first. This gets this leader up to the latest + // possible commit index, even in the absence of client commands. This used + // to append a configuration entry instead of a noop. However, that permits + // an unbounded number of uncommitted configurations in the log. We now + // maintain that there exists at most one uncommitted configuration entry in + // any log, so we have to do proper no-ops here. + noop := &logFuture{ + log: Log{ + Type: LogNoop, + }, + } + r.dispatchLogs([]*logFuture{noop}) + + // Sit in the leader loop until we step down + r.leaderLoop() +} + +// startStopReplication will set up state and start asynchronous replication to +// new peers, and stop replication to removed peers. Before removing a peer, +// it'll instruct the replication routines to try to replicate to the current +// index. This must only be called from the main thread. +func (r *Raft) startStopReplication() { + inConfig := make(map[ServerID]bool, len(r.configurations.latest.Servers)) + lastIdx := r.getLastIndex() + + // Start replication goroutines that need starting + for _, server := range r.configurations.latest.Servers { + if server.ID == r.localID { + continue + } + inConfig[server.ID] = true + if _, ok := r.leaderState.replState[server.ID]; !ok { + r.logger.Info(fmt.Sprintf("Added peer %v, starting replication", server.ID)) + s := &followerReplication{ + peer: server, + commitment: r.leaderState.commitment, + stopCh: make(chan uint64, 1), + triggerCh: make(chan struct{}, 1), + currentTerm: r.getCurrentTerm(), + nextIndex: lastIdx + 1, + lastContact: time.Now(), + notify: make(map[*verifyFuture]struct{}), + notifyCh: make(chan struct{}, 1), + stepDown: r.leaderState.stepDown, + } + r.leaderState.replState[server.ID] = s + r.goFunc(func() { r.replicate(s) }) + asyncNotifyCh(s.triggerCh) + r.observe(PeerObservation{Peer: server, Removed: false}) + } + } + + // Stop replication goroutines that need stopping + for serverID, repl := range r.leaderState.replState { + if inConfig[serverID] { + continue + } + // Replicate up to lastIdx and stop + r.logger.Info(fmt.Sprintf("Removed peer %v, stopping replication after %v", serverID, lastIdx)) + repl.stopCh <- lastIdx + close(repl.stopCh) + delete(r.leaderState.replState, serverID) + r.observe(PeerObservation{Peer: repl.peer, Removed: true}) + } +} + +// configurationChangeChIfStable returns r.configurationChangeCh if it's safe +// to process requests from it, or nil otherwise. This must only be called +// from the main thread. +// +// Note that if the conditions here were to change outside of leaderLoop to take +// this from nil to non-nil, we would need leaderLoop to be kicked. +func (r *Raft) configurationChangeChIfStable() chan *configurationChangeFuture { + // Have to wait until: + // 1. The latest configuration is committed, and + // 2. This leader has committed some entry (the noop) in this term + // https://groups.google.com/forum/#!msg/raft-dev/t4xj6dJTP6E/d2D9LrWRza8J + if r.configurations.latestIndex == r.configurations.committedIndex && + r.getCommitIndex() >= r.leaderState.commitment.startIndex { + return r.configurationChangeCh + } + return nil +} + +// leaderLoop is the hot loop for a leader. It is invoked +// after all the various leader setup is done. +func (r *Raft) leaderLoop() { + // stepDown is used to track if there is an inflight log that + // would cause us to lose leadership (specifically a RemovePeer of + // ourselves). If this is the case, we must not allow any logs to + // be processed in parallel, otherwise we are basing commit on + // only a single peer (ourself) and replicating to an undefined set + // of peers. + stepDown := false + + lease := time.After(r.conf.LeaderLeaseTimeout) + for r.getState() == Leader { + select { + case rpc := <-r.rpcCh: + r.processRPC(rpc) + + case <-r.leaderState.stepDown: + r.setState(Follower) + + case <-r.leaderState.commitCh: + // Process the newly committed entries + oldCommitIndex := r.getCommitIndex() + commitIndex := r.leaderState.commitment.getCommitIndex() + r.setCommitIndex(commitIndex) + + if r.configurations.latestIndex > oldCommitIndex && + r.configurations.latestIndex <= commitIndex { + r.configurations.committed = r.configurations.latest + r.configurations.committedIndex = r.configurations.latestIndex + if !hasVote(r.configurations.committed, r.localID) { + stepDown = true + } + } + + var numProcessed int + start := time.Now() + + for { + e := r.leaderState.inflight.Front() + if e == nil { + break + } + commitLog := e.Value.(*logFuture) + idx := commitLog.log.Index + if idx > commitIndex { + break + } + // Measure the commit time + metrics.MeasureSince([]string{"raft", "commitTime"}, commitLog.dispatch) + + r.processLogs(idx, commitLog) + + r.leaderState.inflight.Remove(e) + numProcessed++ + } + + // Measure the time to enqueue batch of logs for FSM to apply + metrics.MeasureSince([]string{"raft", "fsm", "enqueue"}, start) + + // Count the number of logs enqueued + metrics.SetGauge([]string{"raft", "commitNumLogs"}, float32(numProcessed)) + + if stepDown { + if r.conf.ShutdownOnRemove { + r.logger.Info("Removed ourself, shutting down") + r.Shutdown() + } else { + r.logger.Info("Removed ourself, transitioning to follower") + r.setState(Follower) + } + } + + case v := <-r.verifyCh: + if v.quorumSize == 0 { + // Just dispatched, start the verification + r.verifyLeader(v) + + } else if v.votes < v.quorumSize { + // Early return, means there must be a new leader + r.logger.Warn("New leader elected, stepping down") + r.setState(Follower) + delete(r.leaderState.notify, v) + for _, repl := range r.leaderState.replState { + repl.cleanNotify(v) + } + v.respond(ErrNotLeader) + + } else { + // Quorum of members agree, we are still leader + delete(r.leaderState.notify, v) + for _, repl := range r.leaderState.replState { + repl.cleanNotify(v) + } + v.respond(nil) + } + + case future := <-r.userRestoreCh: + err := r.restoreUserSnapshot(future.meta, future.reader) + future.respond(err) + + case c := <-r.configurationsCh: + c.configurations = r.configurations.Clone() + c.respond(nil) + + case future := <-r.configurationChangeChIfStable(): + r.appendConfigurationEntry(future) + + case b := <-r.bootstrapCh: + b.respond(ErrCantBootstrap) + + case newLog := <-r.applyCh: + // Group commit, gather all the ready commits + ready := []*logFuture{newLog} + for i := 0; i < r.conf.MaxAppendEntries; i++ { + select { + case newLog := <-r.applyCh: + ready = append(ready, newLog) + default: + break + } + } + + // Dispatch the logs + if stepDown { + // we're in the process of stepping down as leader, don't process anything new + for i := range ready { + ready[i].respond(ErrNotLeader) + } + } else { + r.dispatchLogs(ready) + } + + case <-lease: + // Check if we've exceeded the lease, potentially stepping down + maxDiff := r.checkLeaderLease() + + // Next check interval should adjust for the last node we've + // contacted, without going negative + checkInterval := r.conf.LeaderLeaseTimeout - maxDiff + if checkInterval < minCheckInterval { + checkInterval = minCheckInterval + } + + // Renew the lease timer + lease = time.After(checkInterval) + + case <-r.shutdownCh: + return + } + } +} + +// verifyLeader must be called from the main thread for safety. +// Causes the followers to attempt an immediate heartbeat. +func (r *Raft) verifyLeader(v *verifyFuture) { + // Current leader always votes for self + v.votes = 1 + + // Set the quorum size, hot-path for single node + v.quorumSize = r.quorumSize() + if v.quorumSize == 1 { + v.respond(nil) + return + } + + // Track this request + v.notifyCh = r.verifyCh + r.leaderState.notify[v] = struct{}{} + + // Trigger immediate heartbeats + for _, repl := range r.leaderState.replState { + repl.notifyLock.Lock() + repl.notify[v] = struct{}{} + repl.notifyLock.Unlock() + asyncNotifyCh(repl.notifyCh) + } +} + +// checkLeaderLease is used to check if we can contact a quorum of nodes +// within the last leader lease interval. If not, we need to step down, +// as we may have lost connectivity. Returns the maximum duration without +// contact. This must only be called from the main thread. +func (r *Raft) checkLeaderLease() time.Duration { + // Track contacted nodes, we can always contact ourself + contacted := 1 + + // Check each follower + var maxDiff time.Duration + now := time.Now() + for peer, f := range r.leaderState.replState { + diff := now.Sub(f.LastContact()) + if diff <= r.conf.LeaderLeaseTimeout { + contacted++ + if diff > maxDiff { + maxDiff = diff + } + } else { + // Log at least once at high value, then debug. Otherwise it gets very verbose. + if diff <= 3*r.conf.LeaderLeaseTimeout { + r.logger.Warn(fmt.Sprintf("Failed to contact %v in %v", peer, diff)) + } else { + r.logger.Debug(fmt.Sprintf("Failed to contact %v in %v", peer, diff)) + } + } + metrics.AddSample([]string{"raft", "leader", "lastContact"}, float32(diff/time.Millisecond)) + } + + // Verify we can contact a quorum + quorum := r.quorumSize() + if contacted < quorum { + r.logger.Warn("Failed to contact quorum of nodes, stepping down") + r.setState(Follower) + metrics.IncrCounter([]string{"raft", "transition", "leader_lease_timeout"}, 1) + } + return maxDiff +} + +// quorumSize is used to return the quorum size. This must only be called on +// the main thread. +// TODO: revisit usage +func (r *Raft) quorumSize() int { + voters := 0 + for _, server := range r.configurations.latest.Servers { + if server.Suffrage == Voter { + voters++ + } + } + return voters/2 + 1 +} + +// restoreUserSnapshot is used to manually consume an external snapshot, such +// as if restoring from a backup. We will use the current Raft configuration, +// not the one from the snapshot, so that we can restore into a new cluster. We +// will also use the higher of the index of the snapshot, or the current index, +// and then add 1 to that, so we force a new state with a hole in the Raft log, +// so that the snapshot will be sent to followers and used for any new joiners. +// This can only be run on the leader, and returns a future that can be used to +// block until complete. +func (r *Raft) restoreUserSnapshot(meta *SnapshotMeta, reader io.Reader) error { + defer metrics.MeasureSince([]string{"raft", "restoreUserSnapshot"}, time.Now()) + + // Sanity check the version. + version := meta.Version + if version < SnapshotVersionMin || version > SnapshotVersionMax { + return fmt.Errorf("unsupported snapshot version %d", version) + } + + // We don't support snapshots while there's a config change + // outstanding since the snapshot doesn't have a means to + // represent this state. + committedIndex := r.configurations.committedIndex + latestIndex := r.configurations.latestIndex + if committedIndex != latestIndex { + return fmt.Errorf("cannot restore snapshot now, wait until the configuration entry at %v has been applied (have applied %v)", + latestIndex, committedIndex) + } + + // Cancel any inflight requests. + for { + e := r.leaderState.inflight.Front() + if e == nil { + break + } + e.Value.(*logFuture).respond(ErrAbortedByRestore) + r.leaderState.inflight.Remove(e) + } + + // We will overwrite the snapshot metadata with the current term, + // an index that's greater than the current index, or the last + // index in the snapshot. It's important that we leave a hole in + // the index so we know there's nothing in the Raft log there and + // replication will fault and send the snapshot. + term := r.getCurrentTerm() + lastIndex := r.getLastIndex() + if meta.Index > lastIndex { + lastIndex = meta.Index + } + lastIndex++ + + // Dump the snapshot. Note that we use the latest configuration, + // not the one that came with the snapshot. + sink, err := r.snapshots.Create(version, lastIndex, term, + r.configurations.latest, r.configurations.latestIndex, r.trans) + if err != nil { + return fmt.Errorf("failed to create snapshot: %v", err) + } + n, err := io.Copy(sink, reader) + if err != nil { + sink.Cancel() + return fmt.Errorf("failed to write snapshot: %v", err) + } + if n != meta.Size { + sink.Cancel() + return fmt.Errorf("failed to write snapshot, size didn't match (%d != %d)", n, meta.Size) + } + if err := sink.Close(); err != nil { + return fmt.Errorf("failed to close snapshot: %v", err) + } + r.logger.Info(fmt.Sprintf("Copied %d bytes to local snapshot", n)) + + // Restore the snapshot into the FSM. If this fails we are in a + // bad state so we panic to take ourselves out. + fsm := &restoreFuture{ID: sink.ID()} + fsm.init() + select { + case r.fsmMutateCh <- fsm: + case <-r.shutdownCh: + return ErrRaftShutdown + } + if err := fsm.Error(); err != nil { + panic(fmt.Errorf("failed to restore snapshot: %v", err)) + } + + // We set the last log so it looks like we've stored the empty + // index we burned. The last applied is set because we made the + // FSM take the snapshot state, and we store the last snapshot + // in the stable store since we created a snapshot as part of + // this process. + r.setLastLog(lastIndex, term) + r.setLastApplied(lastIndex) + r.setLastSnapshot(lastIndex, term) + + r.logger.Info(fmt.Sprintf("Restored user snapshot (index %d)", lastIndex)) + return nil +} + +// appendConfigurationEntry changes the configuration and adds a new +// configuration entry to the log. This must only be called from the +// main thread. +func (r *Raft) appendConfigurationEntry(future *configurationChangeFuture) { + configuration, err := nextConfiguration(r.configurations.latest, r.configurations.latestIndex, future.req) + if err != nil { + future.respond(err) + return + } + + r.logger.Info(fmt.Sprintf("Updating configuration with %s (%v, %v) to %+v", + future.req.command, future.req.serverID, future.req.serverAddress, configuration.Servers)) + + // In pre-ID compatibility mode we translate all configuration changes + // in to an old remove peer message, which can handle all supported + // cases for peer changes in the pre-ID world (adding and removing + // voters). Both add peer and remove peer log entries are handled + // similarly on old Raft servers, but remove peer does extra checks to + // see if a leader needs to step down. Since they both assert the full + // configuration, then we can safely call remove peer for everything. + if r.protocolVersion < 2 { + future.log = Log{ + Type: LogRemovePeerDeprecated, + Data: encodePeers(configuration, r.trans), + } + } else { + future.log = Log{ + Type: LogConfiguration, + Data: encodeConfiguration(configuration), + } + } + + r.dispatchLogs([]*logFuture{&future.logFuture}) + index := future.Index() + r.configurations.latest = configuration + r.configurations.latestIndex = index + r.leaderState.commitment.setConfiguration(configuration) + r.startStopReplication() +} + +// dispatchLog is called on the leader to push a log to disk, mark it +// as inflight and begin replication of it. +func (r *Raft) dispatchLogs(applyLogs []*logFuture) { + now := time.Now() + defer metrics.MeasureSince([]string{"raft", "leader", "dispatchLog"}, now) + + term := r.getCurrentTerm() + lastIndex := r.getLastIndex() + + n := len(applyLogs) + logs := make([]*Log, n) + metrics.SetGauge([]string{"raft", "leader", "dispatchNumLogs"}, float32(n)) + + for idx, applyLog := range applyLogs { + applyLog.dispatch = now + lastIndex++ + applyLog.log.Index = lastIndex + applyLog.log.Term = term + logs[idx] = &applyLog.log + r.leaderState.inflight.PushBack(applyLog) + } + + // Write the log entry locally + if err := r.logs.StoreLogs(logs); err != nil { + r.logger.Error(fmt.Sprintf("Failed to commit logs: %v", err)) + for _, applyLog := range applyLogs { + applyLog.respond(err) + } + r.setState(Follower) + return + } + r.leaderState.commitment.match(r.localID, lastIndex) + + // Update the last log since it's on disk now + r.setLastLog(lastIndex, term) + + // Notify the replicators of the new log + for _, f := range r.leaderState.replState { + asyncNotifyCh(f.triggerCh) + } +} + +// processLogs is used to apply all the committed entries that haven't been +// applied up to the given index limit. +// This can be called from both leaders and followers. +// Followers call this from AppendEntries, for n entries at a time, and always +// pass future=nil. +// Leaders call this once per inflight when entries are committed. They pass +// the future from inflights. +func (r *Raft) processLogs(index uint64, future *logFuture) { + // Reject logs we've applied already + lastApplied := r.getLastApplied() + if index <= lastApplied { + r.logger.Warn(fmt.Sprintf("Skipping application of old log: %d", index)) + return + } + + // Apply all the preceding logs + for idx := r.getLastApplied() + 1; idx <= index; idx++ { + // Get the log, either from the future or from our log store + if future != nil && future.log.Index == idx { + r.processLog(&future.log, future) + } else { + l := new(Log) + if err := r.logs.GetLog(idx, l); err != nil { + r.logger.Error(fmt.Sprintf("Failed to get log at %d: %v", idx, err)) + panic(err) + } + r.processLog(l, nil) + } + + // Update the lastApplied index and term + r.setLastApplied(idx) + } +} + +// processLog is invoked to process the application of a single committed log entry. +func (r *Raft) processLog(l *Log, future *logFuture) { + switch l.Type { + case LogBarrier: + // Barrier is handled by the FSM + fallthrough + + case LogCommand: + // Forward to the fsm handler + select { + case r.fsmMutateCh <- &commitTuple{l, future}: + case <-r.shutdownCh: + if future != nil { + future.respond(ErrRaftShutdown) + } + } + + // Return so that the future is only responded to + // by the FSM handler when the application is done + return + + case LogConfiguration: + case LogAddPeerDeprecated: + case LogRemovePeerDeprecated: + case LogNoop: + // Ignore the no-op + + default: + panic(fmt.Errorf("unrecognized log type: %#v", l)) + } + + // Invoke the future if given + if future != nil { + future.respond(nil) + } +} + +// processRPC is called to handle an incoming RPC request. This must only be +// called from the main thread. +func (r *Raft) processRPC(rpc RPC) { + if err := r.checkRPCHeader(rpc); err != nil { + rpc.Respond(nil, err) + return + } + + switch cmd := rpc.Command.(type) { + case *AppendEntriesRequest: + r.appendEntries(rpc, cmd) + case *RequestVoteRequest: + r.requestVote(rpc, cmd) + case *InstallSnapshotRequest: + r.installSnapshot(rpc, cmd) + default: + r.logger.Error(fmt.Sprintf("Got unexpected command: %#v", rpc.Command)) + rpc.Respond(nil, fmt.Errorf("unexpected command")) + } +} + +// processHeartbeat is a special handler used just for heartbeat requests +// so that they can be fast-pathed if a transport supports it. This must only +// be called from the main thread. +func (r *Raft) processHeartbeat(rpc RPC) { + defer metrics.MeasureSince([]string{"raft", "rpc", "processHeartbeat"}, time.Now()) + + // Check if we are shutdown, just ignore the RPC + select { + case <-r.shutdownCh: + return + default: + } + + // Ensure we are only handling a heartbeat + switch cmd := rpc.Command.(type) { + case *AppendEntriesRequest: + r.appendEntries(rpc, cmd) + default: + r.logger.Error(fmt.Sprintf("Expected heartbeat, got command: %#v", rpc.Command)) + rpc.Respond(nil, fmt.Errorf("unexpected command")) + } +} + +// appendEntries is invoked when we get an append entries RPC call. This must +// only be called from the main thread. +func (r *Raft) appendEntries(rpc RPC, a *AppendEntriesRequest) { + defer metrics.MeasureSince([]string{"raft", "rpc", "appendEntries"}, time.Now()) + // Setup a response + resp := &AppendEntriesResponse{ + RPCHeader: r.getRPCHeader(), + Term: r.getCurrentTerm(), + LastLog: r.getLastIndex(), + Success: false, + NoRetryBackoff: false, + } + var rpcErr error + defer func() { + rpc.Respond(resp, rpcErr) + }() + + // Ignore an older term + if a.Term < r.getCurrentTerm() { + return + } + + // Increase the term if we see a newer one, also transition to follower + // if we ever get an appendEntries call + if a.Term > r.getCurrentTerm() || r.getState() != Follower { + // Ensure transition to follower + r.setState(Follower) + r.setCurrentTerm(a.Term) + resp.Term = a.Term + } + + // Save the current leader + r.setLeader(ServerAddress(r.trans.DecodePeer(a.Leader))) + + // Verify the last log entry + if a.PrevLogEntry > 0 { + lastIdx, lastTerm := r.getLastEntry() + + var prevLogTerm uint64 + if a.PrevLogEntry == lastIdx { + prevLogTerm = lastTerm + + } else { + var prevLog Log + if err := r.logs.GetLog(a.PrevLogEntry, &prevLog); err != nil { + r.logger.Warn(fmt.Sprintf("Failed to get previous log: %d %v (last: %d)", + a.PrevLogEntry, err, lastIdx)) + resp.NoRetryBackoff = true + return + } + prevLogTerm = prevLog.Term + } + + if a.PrevLogTerm != prevLogTerm { + r.logger.Warn(fmt.Sprintf("Previous log term mis-match: ours: %d remote: %d", + prevLogTerm, a.PrevLogTerm)) + resp.NoRetryBackoff = true + return + } + } + + // Process any new entries + if len(a.Entries) > 0 { + start := time.Now() + + // Delete any conflicting entries, skip any duplicates + lastLogIdx, _ := r.getLastLog() + var newEntries []*Log + for i, entry := range a.Entries { + if entry.Index > lastLogIdx { + newEntries = a.Entries[i:] + break + } + var storeEntry Log + if err := r.logs.GetLog(entry.Index, &storeEntry); err != nil { + r.logger.Warn(fmt.Sprintf("Failed to get log entry %d: %v", + entry.Index, err)) + return + } + if entry.Term != storeEntry.Term { + r.logger.Warn(fmt.Sprintf("Clearing log suffix from %d to %d", entry.Index, lastLogIdx)) + if err := r.logs.DeleteRange(entry.Index, lastLogIdx); err != nil { + r.logger.Error(fmt.Sprintf("Failed to clear log suffix: %v", err)) + return + } + if entry.Index <= r.configurations.latestIndex { + r.configurations.latest = r.configurations.committed + r.configurations.latestIndex = r.configurations.committedIndex + } + newEntries = a.Entries[i:] + break + } + } + + if n := len(newEntries); n > 0 { + // Append the new entries + if err := r.logs.StoreLogs(newEntries); err != nil { + r.logger.Error(fmt.Sprintf("Failed to append to logs: %v", err)) + // TODO: leaving r.getLastLog() in the wrong + // state if there was a truncation above + return + } + + // Handle any new configuration changes + for _, newEntry := range newEntries { + r.processConfigurationLogEntry(newEntry) + } + + // Update the lastLog + last := newEntries[n-1] + r.setLastLog(last.Index, last.Term) + } + + metrics.MeasureSince([]string{"raft", "rpc", "appendEntries", "storeLogs"}, start) + } + + // Update the commit index + if a.LeaderCommitIndex > 0 && a.LeaderCommitIndex > r.getCommitIndex() { + start := time.Now() + idx := min(a.LeaderCommitIndex, r.getLastIndex()) + r.setCommitIndex(idx) + if r.configurations.latestIndex <= idx { + r.configurations.committed = r.configurations.latest + r.configurations.committedIndex = r.configurations.latestIndex + } + r.processLogs(idx, nil) + metrics.MeasureSince([]string{"raft", "rpc", "appendEntries", "processLogs"}, start) + } + + // Everything went well, set success + resp.Success = true + r.setLastContact() + return +} + +// processConfigurationLogEntry takes a log entry and updates the latest +// configuration if the entry results in a new configuration. This must only be +// called from the main thread, or from NewRaft() before any threads have begun. +func (r *Raft) processConfigurationLogEntry(entry *Log) { + if entry.Type == LogConfiguration { + r.configurations.committed = r.configurations.latest + r.configurations.committedIndex = r.configurations.latestIndex + r.configurations.latest = decodeConfiguration(entry.Data) + r.configurations.latestIndex = entry.Index + } else if entry.Type == LogAddPeerDeprecated || entry.Type == LogRemovePeerDeprecated { + r.configurations.committed = r.configurations.latest + r.configurations.committedIndex = r.configurations.latestIndex + r.configurations.latest = decodePeers(entry.Data, r.trans) + r.configurations.latestIndex = entry.Index + } +} + +// requestVote is invoked when we get an request vote RPC call. +func (r *Raft) requestVote(rpc RPC, req *RequestVoteRequest) { + defer metrics.MeasureSince([]string{"raft", "rpc", "requestVote"}, time.Now()) + r.observe(*req) + + // Setup a response + resp := &RequestVoteResponse{ + RPCHeader: r.getRPCHeader(), + Term: r.getCurrentTerm(), + Granted: false, + } + var rpcErr error + defer func() { + rpc.Respond(resp, rpcErr) + }() + + // Version 0 servers will panic unless the peers is present. It's only + // used on them to produce a warning message. + if r.protocolVersion < 2 { + resp.Peers = encodePeers(r.configurations.latest, r.trans) + } + + // Check if we have an existing leader [who's not the candidate] + candidate := r.trans.DecodePeer(req.Candidate) + if leader := r.Leader(); leader != "" && leader != candidate { + r.logger.Warn(fmt.Sprintf("Rejecting vote request from %v since we have a leader: %v", + candidate, leader)) + return + } + + // Ignore an older term + if req.Term < r.getCurrentTerm() { + return + } + + // Increase the term if we see a newer one + if req.Term > r.getCurrentTerm() { + // Ensure transition to follower + r.setState(Follower) + r.setCurrentTerm(req.Term) + resp.Term = req.Term + } + + // Check if we have voted yet + lastVoteTerm, err := r.stable.GetUint64(keyLastVoteTerm) + if err != nil && err.Error() != "not found" { + r.logger.Error(fmt.Sprintf("Failed to get last vote term: %v", err)) + return + } + lastVoteCandBytes, err := r.stable.Get(keyLastVoteCand) + if err != nil && err.Error() != "not found" { + r.logger.Error(fmt.Sprintf("Failed to get last vote candidate: %v", err)) + return + } + + // Check if we've voted in this election before + if lastVoteTerm == req.Term && lastVoteCandBytes != nil { + r.logger.Info(fmt.Sprintf("Duplicate RequestVote for same term: %d", req.Term)) + if bytes.Compare(lastVoteCandBytes, req.Candidate) == 0 { + r.logger.Warn(fmt.Sprintf("Duplicate RequestVote from candidate: %s", req.Candidate)) + resp.Granted = true + } + return + } + + // Reject if their term is older + lastIdx, lastTerm := r.getLastEntry() + if lastTerm > req.LastLogTerm { + r.logger.Warn(fmt.Sprintf("Rejecting vote request from %v since our last term is greater (%d, %d)", + candidate, lastTerm, req.LastLogTerm)) + return + } + + if lastTerm == req.LastLogTerm && lastIdx > req.LastLogIndex { + r.logger.Warn(fmt.Sprintf("Rejecting vote request from %v since our last index is greater (%d, %d)", + candidate, lastIdx, req.LastLogIndex)) + return + } + + // Persist a vote for safety + if err := r.persistVote(req.Term, req.Candidate); err != nil { + r.logger.Error(fmt.Sprintf("Failed to persist vote: %v", err)) + return + } + + resp.Granted = true + r.setLastContact() + return +} + +// installSnapshot is invoked when we get a InstallSnapshot RPC call. +// We must be in the follower state for this, since it means we are +// too far behind a leader for log replay. This must only be called +// from the main thread. +func (r *Raft) installSnapshot(rpc RPC, req *InstallSnapshotRequest) { + defer metrics.MeasureSince([]string{"raft", "rpc", "installSnapshot"}, time.Now()) + // Setup a response + resp := &InstallSnapshotResponse{ + Term: r.getCurrentTerm(), + Success: false, + } + var rpcErr error + defer func() { + io.Copy(ioutil.Discard, rpc.Reader) // ensure we always consume all the snapshot data from the stream [see issue #212] + rpc.Respond(resp, rpcErr) + }() + + // Sanity check the version + if req.SnapshotVersion < SnapshotVersionMin || + req.SnapshotVersion > SnapshotVersionMax { + rpcErr = fmt.Errorf("unsupported snapshot version %d", req.SnapshotVersion) + return + } + + // Ignore an older term + if req.Term < r.getCurrentTerm() { + r.logger.Info(fmt.Sprintf("Ignoring installSnapshot request with older term of %d vs currentTerm %d", + req.Term, r.getCurrentTerm())) + return + } + + // Increase the term if we see a newer one + if req.Term > r.getCurrentTerm() { + // Ensure transition to follower + r.setState(Follower) + r.setCurrentTerm(req.Term) + resp.Term = req.Term + } + + // Save the current leader + r.setLeader(ServerAddress(r.trans.DecodePeer(req.Leader))) + + // Create a new snapshot + var reqConfiguration Configuration + var reqConfigurationIndex uint64 + if req.SnapshotVersion > 0 { + reqConfiguration = decodeConfiguration(req.Configuration) + reqConfigurationIndex = req.ConfigurationIndex + } else { + reqConfiguration = decodePeers(req.Peers, r.trans) + reqConfigurationIndex = req.LastLogIndex + } + version := getSnapshotVersion(r.protocolVersion) + sink, err := r.snapshots.Create(version, req.LastLogIndex, req.LastLogTerm, + reqConfiguration, reqConfigurationIndex, r.trans) + if err != nil { + r.logger.Error(fmt.Sprintf("Failed to create snapshot to install: %v", err)) + rpcErr = fmt.Errorf("failed to create snapshot: %v", err) + return + } + + // Spill the remote snapshot to disk + n, err := io.Copy(sink, rpc.Reader) + if err != nil { + sink.Cancel() + r.logger.Error(fmt.Sprintf("Failed to copy snapshot: %v", err)) + rpcErr = err + return + } + + // Check that we received it all + if n != req.Size { + sink.Cancel() + r.logger.Error(fmt.Sprintf("Failed to receive whole snapshot: %d / %d", n, req.Size)) + rpcErr = fmt.Errorf("short read") + return + } + + // Finalize the snapshot + if err := sink.Close(); err != nil { + r.logger.Error(fmt.Sprintf("Failed to finalize snapshot: %v", err)) + rpcErr = err + return + } + r.logger.Info(fmt.Sprintf("Copied %d bytes to local snapshot", n)) + + // Restore snapshot + future := &restoreFuture{ID: sink.ID()} + future.init() + select { + case r.fsmMutateCh <- future: + case <-r.shutdownCh: + future.respond(ErrRaftShutdown) + return + } + + // Wait for the restore to happen + if err := future.Error(); err != nil { + r.logger.Error(fmt.Sprintf("Failed to restore snapshot: %v", err)) + rpcErr = err + return + } + + // Update the lastApplied so we don't replay old logs + r.setLastApplied(req.LastLogIndex) + + // Update the last stable snapshot info + r.setLastSnapshot(req.LastLogIndex, req.LastLogTerm) + + // Restore the peer set + r.configurations.latest = reqConfiguration + r.configurations.latestIndex = reqConfigurationIndex + r.configurations.committed = reqConfiguration + r.configurations.committedIndex = reqConfigurationIndex + + // Compact logs, continue even if this fails + if err := r.compactLogs(req.LastLogIndex); err != nil { + r.logger.Error(fmt.Sprintf("Failed to compact logs: %v", err)) + } + + r.logger.Info("Installed remote snapshot") + resp.Success = true + r.setLastContact() + return +} + +// setLastContact is used to set the last contact time to now +func (r *Raft) setLastContact() { + r.lastContactLock.Lock() + r.lastContact = time.Now() + r.lastContactLock.Unlock() +} + +type voteResult struct { + RequestVoteResponse + voterID ServerID +} + +// electSelf is used to send a RequestVote RPC to all peers, and vote for +// ourself. This has the side affecting of incrementing the current term. The +// response channel returned is used to wait for all the responses (including a +// vote for ourself). This must only be called from the main thread. +func (r *Raft) electSelf() <-chan *voteResult { + // Create a response channel + respCh := make(chan *voteResult, len(r.configurations.latest.Servers)) + + // Increment the term + r.setCurrentTerm(r.getCurrentTerm() + 1) + + // Construct the request + lastIdx, lastTerm := r.getLastEntry() + req := &RequestVoteRequest{ + RPCHeader: r.getRPCHeader(), + Term: r.getCurrentTerm(), + Candidate: r.trans.EncodePeer(r.localID, r.localAddr), + LastLogIndex: lastIdx, + LastLogTerm: lastTerm, + } + + // Construct a function to ask for a vote + askPeer := func(peer Server) { + r.goFunc(func() { + defer metrics.MeasureSince([]string{"raft", "candidate", "electSelf"}, time.Now()) + resp := &voteResult{voterID: peer.ID} + err := r.trans.RequestVote(peer.ID, peer.Address, req, &resp.RequestVoteResponse) + if err != nil { + r.logger.Error(fmt.Sprintf("Failed to make RequestVote RPC to %v: %v", peer, err)) + resp.Term = req.Term + resp.Granted = false + } + respCh <- resp + }) + } + + // For each peer, request a vote + for _, server := range r.configurations.latest.Servers { + if server.Suffrage == Voter { + if server.ID == r.localID { + // Persist a vote for ourselves + if err := r.persistVote(req.Term, req.Candidate); err != nil { + r.logger.Error(fmt.Sprintf("Failed to persist vote : %v", err)) + return nil + } + // Include our own vote + respCh <- &voteResult{ + RequestVoteResponse: RequestVoteResponse{ + RPCHeader: r.getRPCHeader(), + Term: req.Term, + Granted: true, + }, + voterID: r.localID, + } + } else { + askPeer(server) + } + } + } + + return respCh +} + +// persistVote is used to persist our vote for safety. +func (r *Raft) persistVote(term uint64, candidate []byte) error { + if err := r.stable.SetUint64(keyLastVoteTerm, term); err != nil { + return err + } + if err := r.stable.Set(keyLastVoteCand, candidate); err != nil { + return err + } + return nil +} + +// setCurrentTerm is used to set the current term in a durable manner. +func (r *Raft) setCurrentTerm(t uint64) { + // Persist to disk first + if err := r.stable.SetUint64(keyCurrentTerm, t); err != nil { + panic(fmt.Errorf("failed to save current term: %v", err)) + } + r.raftState.setCurrentTerm(t) +} + +// setState is used to update the current state. Any state +// transition causes the known leader to be cleared. This means +// that leader should be set only after updating the state. +func (r *Raft) setState(state RaftState) { + r.setLeader("") + oldState := r.raftState.getState() + r.raftState.setState(state) + if oldState != state { + r.observe(state) + } +} diff --git a/vendor/github.com/hashicorp/raft/replication.go b/vendor/github.com/hashicorp/raft/replication.go new file mode 100644 index 0000000000..1f5f1007f5 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/replication.go @@ -0,0 +1,572 @@ +package raft + +import ( + "errors" + "fmt" + "sync" + "time" + + "github.com/armon/go-metrics" +) + +const ( + maxFailureScale = 12 + failureWait = 10 * time.Millisecond +) + +var ( + // ErrLogNotFound indicates a given log entry is not available. + ErrLogNotFound = errors.New("log not found") + + // ErrPipelineReplicationNotSupported can be returned by the transport to + // signal that pipeline replication is not supported in general, and that + // no error message should be produced. + ErrPipelineReplicationNotSupported = errors.New("pipeline replication not supported") +) + +// followerReplication is in charge of sending snapshots and log entries from +// this leader during this particular term to a remote follower. +type followerReplication struct { + // peer contains the network address and ID of the remote follower. + peer Server + + // commitment tracks the entries acknowledged by followers so that the + // leader's commit index can advance. It is updated on successful + // AppendEntries responses. + commitment *commitment + + // stopCh is notified/closed when this leader steps down or the follower is + // removed from the cluster. In the follower removed case, it carries a log + // index; replication should be attempted with a best effort up through that + // index, before exiting. + stopCh chan uint64 + // triggerCh is notified every time new entries are appended to the log. + triggerCh chan struct{} + + // currentTerm is the term of this leader, to be included in AppendEntries + // requests. + currentTerm uint64 + // nextIndex is the index of the next log entry to send to the follower, + // which may fall past the end of the log. + nextIndex uint64 + + // lastContact is updated to the current time whenever any response is + // received from the follower (successful or not). This is used to check + // whether the leader should step down (Raft.checkLeaderLease()). + lastContact time.Time + // lastContactLock protects 'lastContact'. + lastContactLock sync.RWMutex + + // failures counts the number of failed RPCs since the last success, which is + // used to apply backoff. + failures uint64 + + // notifyCh is notified to send out a heartbeat, which is used to check that + // this server is still leader. + notifyCh chan struct{} + // notify is a map of futures to be resolved upon receipt of an + // acknowledgement, then cleared from this map. + notify map[*verifyFuture]struct{} + // notifyLock protects 'notify'. + notifyLock sync.Mutex + + // stepDown is used to indicate to the leader that we + // should step down based on information from a follower. + stepDown chan struct{} + + // allowPipeline is used to determine when to pipeline the AppendEntries RPCs. + // It is private to this replication goroutine. + allowPipeline bool +} + +// notifyAll is used to notify all the waiting verify futures +// if the follower believes we are still the leader. +func (s *followerReplication) notifyAll(leader bool) { + // Clear the waiting notifies minimizing lock time + s.notifyLock.Lock() + n := s.notify + s.notify = make(map[*verifyFuture]struct{}) + s.notifyLock.Unlock() + + // Submit our votes + for v, _ := range n { + v.vote(leader) + } +} + +// cleanNotify is used to delete notify, . +func (s *followerReplication) cleanNotify(v *verifyFuture) { + s.notifyLock.Lock() + delete(s.notify, v) + s.notifyLock.Unlock() +} + +// LastContact returns the time of last contact. +func (s *followerReplication) LastContact() time.Time { + s.lastContactLock.RLock() + last := s.lastContact + s.lastContactLock.RUnlock() + return last +} + +// setLastContact sets the last contact to the current time. +func (s *followerReplication) setLastContact() { + s.lastContactLock.Lock() + s.lastContact = time.Now() + s.lastContactLock.Unlock() +} + +// replicate is a long running routine that replicates log entries to a single +// follower. +func (r *Raft) replicate(s *followerReplication) { + // Start an async heartbeating routing + stopHeartbeat := make(chan struct{}) + defer close(stopHeartbeat) + r.goFunc(func() { r.heartbeat(s, stopHeartbeat) }) + +RPC: + shouldStop := false + for !shouldStop { + select { + case maxIndex := <-s.stopCh: + // Make a best effort to replicate up to this index + if maxIndex > 0 { + r.replicateTo(s, maxIndex) + } + return + case <-s.triggerCh: + lastLogIdx, _ := r.getLastLog() + shouldStop = r.replicateTo(s, lastLogIdx) + // This is _not_ our heartbeat mechanism but is to ensure + // followers quickly learn the leader's commit index when + // raft commits stop flowing naturally. The actual heartbeats + // can't do this to keep them unblocked by disk IO on the + // follower. See https://github.com/hashicorp/raft/issues/282. + case <-randomTimeout(r.conf.CommitTimeout): + lastLogIdx, _ := r.getLastLog() + shouldStop = r.replicateTo(s, lastLogIdx) + } + + // If things looks healthy, switch to pipeline mode + if !shouldStop && s.allowPipeline { + goto PIPELINE + } + } + return + +PIPELINE: + // Disable until re-enabled + s.allowPipeline = false + + // Replicates using a pipeline for high performance. This method + // is not able to gracefully recover from errors, and so we fall back + // to standard mode on failure. + if err := r.pipelineReplicate(s); err != nil { + if err != ErrPipelineReplicationNotSupported { + r.logger.Error(fmt.Sprintf("Failed to start pipeline replication to %s: %s", s.peer, err)) + } + } + goto RPC +} + +// replicateTo is a helper to replicate(), used to replicate the logs up to a +// given last index. +// If the follower log is behind, we take care to bring them up to date. +func (r *Raft) replicateTo(s *followerReplication, lastIndex uint64) (shouldStop bool) { + // Create the base request + var req AppendEntriesRequest + var resp AppendEntriesResponse + var start time.Time +START: + // Prevent an excessive retry rate on errors + if s.failures > 0 { + select { + case <-time.After(backoff(failureWait, s.failures, maxFailureScale)): + case <-r.shutdownCh: + } + } + + // Setup the request + if err := r.setupAppendEntries(s, &req, s.nextIndex, lastIndex); err == ErrLogNotFound { + goto SEND_SNAP + } else if err != nil { + return + } + + // Make the RPC call + start = time.Now() + if err := r.trans.AppendEntries(s.peer.ID, s.peer.Address, &req, &resp); err != nil { + r.logger.Error(fmt.Sprintf("Failed to AppendEntries to %v: %v", s.peer, err)) + s.failures++ + return + } + appendStats(string(s.peer.ID), start, float32(len(req.Entries))) + + // Check for a newer term, stop running + if resp.Term > req.Term { + r.handleStaleTerm(s) + return true + } + + // Update the last contact + s.setLastContact() + + // Update s based on success + if resp.Success { + // Update our replication state + updateLastAppended(s, &req) + + // Clear any failures, allow pipelining + s.failures = 0 + s.allowPipeline = true + } else { + s.nextIndex = max(min(s.nextIndex-1, resp.LastLog+1), 1) + if resp.NoRetryBackoff { + s.failures = 0 + } else { + s.failures++ + } + r.logger.Warn(fmt.Sprintf("AppendEntries to %v rejected, sending older logs (next: %d)", s.peer, s.nextIndex)) + } + +CHECK_MORE: + // Poll the stop channel here in case we are looping and have been asked + // to stop, or have stepped down as leader. Even for the best effort case + // where we are asked to replicate to a given index and then shutdown, + // it's better to not loop in here to send lots of entries to a straggler + // that's leaving the cluster anyways. + select { + case <-s.stopCh: + return true + default: + } + + // Check if there are more logs to replicate + if s.nextIndex <= lastIndex { + goto START + } + return + + // SEND_SNAP is used when we fail to get a log, usually because the follower + // is too far behind, and we must ship a snapshot down instead +SEND_SNAP: + if stop, err := r.sendLatestSnapshot(s); stop { + return true + } else if err != nil { + r.logger.Error(fmt.Sprintf("Failed to send snapshot to %v: %v", s.peer, err)) + return + } + + // Check if there is more to replicate + goto CHECK_MORE +} + +// sendLatestSnapshot is used to send the latest snapshot we have +// down to our follower. +func (r *Raft) sendLatestSnapshot(s *followerReplication) (bool, error) { + // Get the snapshots + snapshots, err := r.snapshots.List() + if err != nil { + r.logger.Error(fmt.Sprintf("Failed to list snapshots: %v", err)) + return false, err + } + + // Check we have at least a single snapshot + if len(snapshots) == 0 { + return false, fmt.Errorf("no snapshots found") + } + + // Open the most recent snapshot + snapID := snapshots[0].ID + meta, snapshot, err := r.snapshots.Open(snapID) + if err != nil { + r.logger.Error(fmt.Sprintf("Failed to open snapshot %v: %v", snapID, err)) + return false, err + } + defer snapshot.Close() + + // Setup the request + req := InstallSnapshotRequest{ + RPCHeader: r.getRPCHeader(), + SnapshotVersion: meta.Version, + Term: s.currentTerm, + Leader: r.trans.EncodePeer(r.localID, r.localAddr), + LastLogIndex: meta.Index, + LastLogTerm: meta.Term, + Peers: meta.Peers, + Size: meta.Size, + Configuration: encodeConfiguration(meta.Configuration), + ConfigurationIndex: meta.ConfigurationIndex, + } + + // Make the call + start := time.Now() + var resp InstallSnapshotResponse + if err := r.trans.InstallSnapshot(s.peer.ID, s.peer.Address, &req, &resp, snapshot); err != nil { + r.logger.Error(fmt.Sprintf("Failed to install snapshot %v: %v", snapID, err)) + s.failures++ + return false, err + } + metrics.MeasureSince([]string{"raft", "replication", "installSnapshot", string(s.peer.ID)}, start) + + // Check for a newer term, stop running + if resp.Term > req.Term { + r.handleStaleTerm(s) + return true, nil + } + + // Update the last contact + s.setLastContact() + + // Check for success + if resp.Success { + // Update the indexes + s.nextIndex = meta.Index + 1 + s.commitment.match(s.peer.ID, meta.Index) + + // Clear any failures + s.failures = 0 + + // Notify we are still leader + s.notifyAll(true) + } else { + s.failures++ + r.logger.Warn(fmt.Sprintf("InstallSnapshot to %v rejected", s.peer)) + } + return false, nil +} + +// heartbeat is used to periodically invoke AppendEntries on a peer +// to ensure they don't time out. This is done async of replicate(), +// since that routine could potentially be blocked on disk IO. +func (r *Raft) heartbeat(s *followerReplication, stopCh chan struct{}) { + var failures uint64 + req := AppendEntriesRequest{ + RPCHeader: r.getRPCHeader(), + Term: s.currentTerm, + Leader: r.trans.EncodePeer(r.localID, r.localAddr), + } + var resp AppendEntriesResponse + for { + // Wait for the next heartbeat interval or forced notify + select { + case <-s.notifyCh: + case <-randomTimeout(r.conf.HeartbeatTimeout / 10): + case <-stopCh: + return + } + + start := time.Now() + if err := r.trans.AppendEntries(s.peer.ID, s.peer.Address, &req, &resp); err != nil { + r.logger.Error(fmt.Sprintf("Failed to heartbeat to %v: %v", s.peer.Address, err)) + failures++ + select { + case <-time.After(backoff(failureWait, failures, maxFailureScale)): + case <-stopCh: + } + } else { + s.setLastContact() + failures = 0 + metrics.MeasureSince([]string{"raft", "replication", "heartbeat", string(s.peer.ID)}, start) + s.notifyAll(resp.Success) + } + } +} + +// pipelineReplicate is used when we have synchronized our state with the follower, +// and want to switch to a higher performance pipeline mode of replication. +// We only pipeline AppendEntries commands, and if we ever hit an error, we fall +// back to the standard replication which can handle more complex situations. +func (r *Raft) pipelineReplicate(s *followerReplication) error { + // Create a new pipeline + pipeline, err := r.trans.AppendEntriesPipeline(s.peer.ID, s.peer.Address) + if err != nil { + return err + } + defer pipeline.Close() + + // Log start and stop of pipeline + r.logger.Info(fmt.Sprintf("pipelining replication to peer %v", s.peer)) + defer r.logger.Info(fmt.Sprintf("aborting pipeline replication to peer %v", s.peer)) + + // Create a shutdown and finish channel + stopCh := make(chan struct{}) + finishCh := make(chan struct{}) + + // Start a dedicated decoder + r.goFunc(func() { r.pipelineDecode(s, pipeline, stopCh, finishCh) }) + + // Start pipeline sends at the last good nextIndex + nextIndex := s.nextIndex + + shouldStop := false +SEND: + for !shouldStop { + select { + case <-finishCh: + break SEND + case maxIndex := <-s.stopCh: + // Make a best effort to replicate up to this index + if maxIndex > 0 { + r.pipelineSend(s, pipeline, &nextIndex, maxIndex) + } + break SEND + case <-s.triggerCh: + lastLogIdx, _ := r.getLastLog() + shouldStop = r.pipelineSend(s, pipeline, &nextIndex, lastLogIdx) + case <-randomTimeout(r.conf.CommitTimeout): + lastLogIdx, _ := r.getLastLog() + shouldStop = r.pipelineSend(s, pipeline, &nextIndex, lastLogIdx) + } + } + + // Stop our decoder, and wait for it to finish + close(stopCh) + select { + case <-finishCh: + case <-r.shutdownCh: + } + return nil +} + +// pipelineSend is used to send data over a pipeline. It is a helper to +// pipelineReplicate. +func (r *Raft) pipelineSend(s *followerReplication, p AppendPipeline, nextIdx *uint64, lastIndex uint64) (shouldStop bool) { + // Create a new append request + req := new(AppendEntriesRequest) + if err := r.setupAppendEntries(s, req, *nextIdx, lastIndex); err != nil { + return true + } + + // Pipeline the append entries + if _, err := p.AppendEntries(req, new(AppendEntriesResponse)); err != nil { + r.logger.Error(fmt.Sprintf("Failed to pipeline AppendEntries to %v: %v", s.peer, err)) + return true + } + + // Increase the next send log to avoid re-sending old logs + if n := len(req.Entries); n > 0 { + last := req.Entries[n-1] + *nextIdx = last.Index + 1 + } + return false +} + +// pipelineDecode is used to decode the responses of pipelined requests. +func (r *Raft) pipelineDecode(s *followerReplication, p AppendPipeline, stopCh, finishCh chan struct{}) { + defer close(finishCh) + respCh := p.Consumer() + for { + select { + case ready := <-respCh: + req, resp := ready.Request(), ready.Response() + appendStats(string(s.peer.ID), ready.Start(), float32(len(req.Entries))) + + // Check for a newer term, stop running + if resp.Term > req.Term { + r.handleStaleTerm(s) + return + } + + // Update the last contact + s.setLastContact() + + // Abort pipeline if not successful + if !resp.Success { + return + } + + // Update our replication state + updateLastAppended(s, req) + case <-stopCh: + return + } + } +} + +// setupAppendEntries is used to setup an append entries request. +func (r *Raft) setupAppendEntries(s *followerReplication, req *AppendEntriesRequest, nextIndex, lastIndex uint64) error { + req.RPCHeader = r.getRPCHeader() + req.Term = s.currentTerm + req.Leader = r.trans.EncodePeer(r.localID, r.localAddr) + req.LeaderCommitIndex = r.getCommitIndex() + if err := r.setPreviousLog(req, nextIndex); err != nil { + return err + } + if err := r.setNewLogs(req, nextIndex, lastIndex); err != nil { + return err + } + return nil +} + +// setPreviousLog is used to setup the PrevLogEntry and PrevLogTerm for an +// AppendEntriesRequest given the next index to replicate. +func (r *Raft) setPreviousLog(req *AppendEntriesRequest, nextIndex uint64) error { + // Guard for the first index, since there is no 0 log entry + // Guard against the previous index being a snapshot as well + lastSnapIdx, lastSnapTerm := r.getLastSnapshot() + if nextIndex == 1 { + req.PrevLogEntry = 0 + req.PrevLogTerm = 0 + + } else if (nextIndex - 1) == lastSnapIdx { + req.PrevLogEntry = lastSnapIdx + req.PrevLogTerm = lastSnapTerm + + } else { + var l Log + if err := r.logs.GetLog(nextIndex-1, &l); err != nil { + r.logger.Error(fmt.Sprintf("Failed to get log at index %d: %v", nextIndex-1, err)) + return err + } + + // Set the previous index and term (0 if nextIndex is 1) + req.PrevLogEntry = l.Index + req.PrevLogTerm = l.Term + } + return nil +} + +// setNewLogs is used to setup the logs which should be appended for a request. +func (r *Raft) setNewLogs(req *AppendEntriesRequest, nextIndex, lastIndex uint64) error { + // Append up to MaxAppendEntries or up to the lastIndex + req.Entries = make([]*Log, 0, r.conf.MaxAppendEntries) + maxIndex := min(nextIndex+uint64(r.conf.MaxAppendEntries)-1, lastIndex) + for i := nextIndex; i <= maxIndex; i++ { + oldLog := new(Log) + if err := r.logs.GetLog(i, oldLog); err != nil { + r.logger.Error(fmt.Sprintf("Failed to get log at index %d: %v", i, err)) + return err + } + req.Entries = append(req.Entries, oldLog) + } + return nil +} + +// appendStats is used to emit stats about an AppendEntries invocation. +func appendStats(peer string, start time.Time, logs float32) { + metrics.MeasureSince([]string{"raft", "replication", "appendEntries", "rpc", peer}, start) + metrics.IncrCounter([]string{"raft", "replication", "appendEntries", "logs", peer}, logs) +} + +// handleStaleTerm is used when a follower indicates that we have a stale term. +func (r *Raft) handleStaleTerm(s *followerReplication) { + r.logger.Error(fmt.Sprintf("peer %v has newer term, stopping replication", s.peer)) + s.notifyAll(false) // No longer leader + asyncNotifyCh(s.stepDown) +} + +// updateLastAppended is used to update follower replication state after a +// successful AppendEntries RPC. +// TODO: This isn't used during InstallSnapshot, but the code there is similar. +func updateLastAppended(s *followerReplication, req *AppendEntriesRequest) { + // Mark any inflight logs as committed + if logs := req.Entries; len(logs) > 0 { + last := logs[len(logs)-1] + s.nextIndex = last.Index + 1 + s.commitment.match(s.peer.ID, last.Index) + } + + // Notify still leader + s.notifyAll(true) +} diff --git a/vendor/github.com/hashicorp/raft/snapshot.go b/vendor/github.com/hashicorp/raft/snapshot.go new file mode 100644 index 0000000000..2e0f77a5dd --- /dev/null +++ b/vendor/github.com/hashicorp/raft/snapshot.go @@ -0,0 +1,239 @@ +package raft + +import ( + "fmt" + "io" + "time" + + "github.com/armon/go-metrics" +) + +// SnapshotMeta is for metadata of a snapshot. +type SnapshotMeta struct { + // Version is the version number of the snapshot metadata. This does not cover + // the application's data in the snapshot, that should be versioned + // separately. + Version SnapshotVersion + + // ID is opaque to the store, and is used for opening. + ID string + + // Index and Term store when the snapshot was taken. + Index uint64 + Term uint64 + + // Peers is deprecated and used to support version 0 snapshots, but will + // be populated in version 1 snapshots as well to help with upgrades. + Peers []byte + + // Configuration and ConfigurationIndex are present in version 1 + // snapshots and later. + Configuration Configuration + ConfigurationIndex uint64 + + // Size is the size of the snapshot in bytes. + Size int64 +} + +// SnapshotStore interface is used to allow for flexible implementations +// of snapshot storage and retrieval. For example, a client could implement +// a shared state store such as S3, allowing new nodes to restore snapshots +// without streaming from the leader. +type SnapshotStore interface { + // Create is used to begin a snapshot at a given index and term, and with + // the given committed configuration. The version parameter controls + // which snapshot version to create. + Create(version SnapshotVersion, index, term uint64, configuration Configuration, + configurationIndex uint64, trans Transport) (SnapshotSink, error) + + // List is used to list the available snapshots in the store. + // It should return then in descending order, with the highest index first. + List() ([]*SnapshotMeta, error) + + // Open takes a snapshot ID and provides a ReadCloser. Once close is + // called it is assumed the snapshot is no longer needed. + Open(id string) (*SnapshotMeta, io.ReadCloser, error) +} + +// SnapshotSink is returned by StartSnapshot. The FSM will Write state +// to the sink and call Close on completion. On error, Cancel will be invoked. +type SnapshotSink interface { + io.WriteCloser + ID() string + Cancel() error +} + +// runSnapshots is a long running goroutine used to manage taking +// new snapshots of the FSM. It runs in parallel to the FSM and +// main goroutines, so that snapshots do not block normal operation. +func (r *Raft) runSnapshots() { + for { + select { + case <-randomTimeout(r.conf.SnapshotInterval): + // Check if we should snapshot + if !r.shouldSnapshot() { + continue + } + + // Trigger a snapshot + if _, err := r.takeSnapshot(); err != nil { + r.logger.Error(fmt.Sprintf("Failed to take snapshot: %v", err)) + } + + case future := <-r.userSnapshotCh: + // User-triggered, run immediately + id, err := r.takeSnapshot() + if err != nil { + r.logger.Error(fmt.Sprintf("Failed to take snapshot: %v", err)) + } else { + future.opener = func() (*SnapshotMeta, io.ReadCloser, error) { + return r.snapshots.Open(id) + } + } + future.respond(err) + + case <-r.shutdownCh: + return + } + } +} + +// shouldSnapshot checks if we meet the conditions to take +// a new snapshot. +func (r *Raft) shouldSnapshot() bool { + // Check the last snapshot index + lastSnap, _ := r.getLastSnapshot() + + // Check the last log index + lastIdx, err := r.logs.LastIndex() + if err != nil { + r.logger.Error(fmt.Sprintf("Failed to get last log index: %v", err)) + return false + } + + // Compare the delta to the threshold + delta := lastIdx - lastSnap + return delta >= r.conf.SnapshotThreshold +} + +// takeSnapshot is used to take a new snapshot. This must only be called from +// the snapshot thread, never the main thread. This returns the ID of the new +// snapshot, along with an error. +func (r *Raft) takeSnapshot() (string, error) { + defer metrics.MeasureSince([]string{"raft", "snapshot", "takeSnapshot"}, time.Now()) + + // Create a request for the FSM to perform a snapshot. + snapReq := &reqSnapshotFuture{} + snapReq.init() + + // Wait for dispatch or shutdown. + select { + case r.fsmSnapshotCh <- snapReq: + case <-r.shutdownCh: + return "", ErrRaftShutdown + } + + // Wait until we get a response + if err := snapReq.Error(); err != nil { + if err != ErrNothingNewToSnapshot { + err = fmt.Errorf("failed to start snapshot: %v", err) + } + return "", err + } + defer snapReq.snapshot.Release() + + // Make a request for the configurations and extract the committed info. + // We have to use the future here to safely get this information since + // it is owned by the main thread. + configReq := &configurationsFuture{} + configReq.init() + select { + case r.configurationsCh <- configReq: + case <-r.shutdownCh: + return "", ErrRaftShutdown + } + if err := configReq.Error(); err != nil { + return "", err + } + committed := configReq.configurations.committed + committedIndex := configReq.configurations.committedIndex + + // We don't support snapshots while there's a config change outstanding + // since the snapshot doesn't have a means to represent this state. This + // is a little weird because we need the FSM to apply an index that's + // past the configuration change, even though the FSM itself doesn't see + // the configuration changes. It should be ok in practice with normal + // application traffic flowing through the FSM. If there's none of that + // then it's not crucial that we snapshot, since there's not much going + // on Raft-wise. + if snapReq.index < committedIndex { + return "", fmt.Errorf("cannot take snapshot now, wait until the configuration entry at %v has been applied (have applied %v)", + committedIndex, snapReq.index) + } + + // Create a new snapshot. + r.logger.Info(fmt.Sprintf("Starting snapshot up to %d", snapReq.index)) + start := time.Now() + version := getSnapshotVersion(r.protocolVersion) + sink, err := r.snapshots.Create(version, snapReq.index, snapReq.term, committed, committedIndex, r.trans) + if err != nil { + return "", fmt.Errorf("failed to create snapshot: %v", err) + } + metrics.MeasureSince([]string{"raft", "snapshot", "create"}, start) + + // Try to persist the snapshot. + start = time.Now() + if err := snapReq.snapshot.Persist(sink); err != nil { + sink.Cancel() + return "", fmt.Errorf("failed to persist snapshot: %v", err) + } + metrics.MeasureSince([]string{"raft", "snapshot", "persist"}, start) + + // Close and check for error. + if err := sink.Close(); err != nil { + return "", fmt.Errorf("failed to close snapshot: %v", err) + } + + // Update the last stable snapshot info. + r.setLastSnapshot(snapReq.index, snapReq.term) + + // Compact the logs. + if err := r.compactLogs(snapReq.index); err != nil { + return "", err + } + + r.logger.Info(fmt.Sprintf("Snapshot to %d complete", snapReq.index)) + return sink.ID(), nil +} + +// compactLogs takes the last inclusive index of a snapshot +// and trims the logs that are no longer needed. +func (r *Raft) compactLogs(snapIdx uint64) error { + defer metrics.MeasureSince([]string{"raft", "compactLogs"}, time.Now()) + // Determine log ranges to compact + minLog, err := r.logs.FirstIndex() + if err != nil { + return fmt.Errorf("failed to get first log index: %v", err) + } + + // Check if we have enough logs to truncate + lastLogIdx, _ := r.getLastLog() + if lastLogIdx <= r.conf.TrailingLogs { + return nil + } + + // Truncate up to the end of the snapshot, or `TrailingLogs` + // back from the head, which ever is further back. This ensures + // at least `TrailingLogs` entries, but does not allow logs + // after the snapshot to be removed. + maxLog := min(snapIdx, lastLogIdx-r.conf.TrailingLogs) + + // Log this + r.logger.Info(fmt.Sprintf("Compacting logs from %d to %d", minLog, maxLog)) + + // Compact the logs + if err := r.logs.DeleteRange(minLog, maxLog); err != nil { + return fmt.Errorf("log compaction failed: %v", err) + } + return nil +} diff --git a/vendor/github.com/hashicorp/raft/stable.go b/vendor/github.com/hashicorp/raft/stable.go new file mode 100644 index 0000000000..ff59a8c570 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/stable.go @@ -0,0 +1,15 @@ +package raft + +// StableStore is used to provide stable storage +// of key configurations to ensure safety. +type StableStore interface { + Set(key []byte, val []byte) error + + // Get returns the value for key, or an empty byte slice if key was not found. + Get(key []byte) ([]byte, error) + + SetUint64(key []byte, val uint64) error + + // GetUint64 returns the uint64 value for key, or 0 if key was not found. + GetUint64(key []byte) (uint64, error) +} diff --git a/vendor/github.com/hashicorp/raft/state.go b/vendor/github.com/hashicorp/raft/state.go new file mode 100644 index 0000000000..a58cd0d19e --- /dev/null +++ b/vendor/github.com/hashicorp/raft/state.go @@ -0,0 +1,171 @@ +package raft + +import ( + "sync" + "sync/atomic" +) + +// RaftState captures the state of a Raft node: Follower, Candidate, Leader, +// or Shutdown. +type RaftState uint32 + +const ( + // Follower is the initial state of a Raft node. + Follower RaftState = iota + + // Candidate is one of the valid states of a Raft node. + Candidate + + // Leader is one of the valid states of a Raft node. + Leader + + // Shutdown is the terminal state of a Raft node. + Shutdown +) + +func (s RaftState) String() string { + switch s { + case Follower: + return "Follower" + case Candidate: + return "Candidate" + case Leader: + return "Leader" + case Shutdown: + return "Shutdown" + default: + return "Unknown" + } +} + +// raftState is used to maintain various state variables +// and provides an interface to set/get the variables in a +// thread safe manner. +type raftState struct { + // currentTerm commitIndex, lastApplied, must be kept at the top of + // the struct so they're 64 bit aligned which is a requirement for + // atomic ops on 32 bit platforms. + + // The current term, cache of StableStore + currentTerm uint64 + + // Highest committed log entry + commitIndex uint64 + + // Last applied log to the FSM + lastApplied uint64 + + // protects 4 next fields + lastLock sync.Mutex + + // Cache the latest snapshot index/term + lastSnapshotIndex uint64 + lastSnapshotTerm uint64 + + // Cache the latest log from LogStore + lastLogIndex uint64 + lastLogTerm uint64 + + // Tracks running goroutines + routinesGroup sync.WaitGroup + + // The current state + state RaftState +} + +func (r *raftState) getState() RaftState { + stateAddr := (*uint32)(&r.state) + return RaftState(atomic.LoadUint32(stateAddr)) +} + +func (r *raftState) setState(s RaftState) { + stateAddr := (*uint32)(&r.state) + atomic.StoreUint32(stateAddr, uint32(s)) +} + +func (r *raftState) getCurrentTerm() uint64 { + return atomic.LoadUint64(&r.currentTerm) +} + +func (r *raftState) setCurrentTerm(term uint64) { + atomic.StoreUint64(&r.currentTerm, term) +} + +func (r *raftState) getLastLog() (index, term uint64) { + r.lastLock.Lock() + index = r.lastLogIndex + term = r.lastLogTerm + r.lastLock.Unlock() + return +} + +func (r *raftState) setLastLog(index, term uint64) { + r.lastLock.Lock() + r.lastLogIndex = index + r.lastLogTerm = term + r.lastLock.Unlock() +} + +func (r *raftState) getLastSnapshot() (index, term uint64) { + r.lastLock.Lock() + index = r.lastSnapshotIndex + term = r.lastSnapshotTerm + r.lastLock.Unlock() + return +} + +func (r *raftState) setLastSnapshot(index, term uint64) { + r.lastLock.Lock() + r.lastSnapshotIndex = index + r.lastSnapshotTerm = term + r.lastLock.Unlock() +} + +func (r *raftState) getCommitIndex() uint64 { + return atomic.LoadUint64(&r.commitIndex) +} + +func (r *raftState) setCommitIndex(index uint64) { + atomic.StoreUint64(&r.commitIndex, index) +} + +func (r *raftState) getLastApplied() uint64 { + return atomic.LoadUint64(&r.lastApplied) +} + +func (r *raftState) setLastApplied(index uint64) { + atomic.StoreUint64(&r.lastApplied, index) +} + +// Start a goroutine and properly handle the race between a routine +// starting and incrementing, and exiting and decrementing. +func (r *raftState) goFunc(f func()) { + r.routinesGroup.Add(1) + go func() { + defer r.routinesGroup.Done() + f() + }() +} + +func (r *raftState) waitShutdown() { + r.routinesGroup.Wait() +} + +// getLastIndex returns the last index in stable storage. +// Either from the last log or from the last snapshot. +func (r *raftState) getLastIndex() uint64 { + r.lastLock.Lock() + defer r.lastLock.Unlock() + return max(r.lastLogIndex, r.lastSnapshotIndex) +} + +// getLastEntry returns the last index and term in stable storage. +// Either from the last log or from the last snapshot. +func (r *raftState) getLastEntry() (uint64, uint64) { + r.lastLock.Lock() + defer r.lastLock.Unlock() + if r.lastLogIndex >= r.lastSnapshotIndex { + return r.lastLogIndex, r.lastLogTerm + } + return r.lastSnapshotIndex, r.lastSnapshotTerm +} diff --git a/vendor/github.com/hashicorp/raft/tag.sh b/vendor/github.com/hashicorp/raft/tag.sh new file mode 100755 index 0000000000..cd16623a70 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/tag.sh @@ -0,0 +1,16 @@ +#!/usr/bin/env bash +set -e + +# The version must be supplied from the environment. Do not include the +# leading "v". +if [ -z $VERSION ]; then + echo "Please specify a version." + exit 1 +fi + +# Generate the tag. +echo "==> Tagging version $VERSION..." +git commit --allow-empty -a --gpg-sign=348FFC4C -m "Release v$VERSION" +git tag -a -m "Version $VERSION" -s -u 348FFC4C "v${VERSION}" master + +exit 0 diff --git a/vendor/github.com/hashicorp/raft/tcp_transport.go b/vendor/github.com/hashicorp/raft/tcp_transport.go new file mode 100644 index 0000000000..69c928ed92 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/tcp_transport.go @@ -0,0 +1,116 @@ +package raft + +import ( + "errors" + "io" + "log" + "net" + "time" +) + +var ( + errNotAdvertisable = errors.New("local bind address is not advertisable") + errNotTCP = errors.New("local address is not a TCP address") +) + +// TCPStreamLayer implements StreamLayer interface for plain TCP. +type TCPStreamLayer struct { + advertise net.Addr + listener *net.TCPListener +} + +// NewTCPTransport returns a NetworkTransport that is built on top of +// a TCP streaming transport layer. +func NewTCPTransport( + bindAddr string, + advertise net.Addr, + maxPool int, + timeout time.Duration, + logOutput io.Writer, +) (*NetworkTransport, error) { + return newTCPTransport(bindAddr, advertise, func(stream StreamLayer) *NetworkTransport { + return NewNetworkTransport(stream, maxPool, timeout, logOutput) + }) +} + +// NewTCPTransportWithLogger returns a NetworkTransport that is built on top of +// a TCP streaming transport layer, with log output going to the supplied Logger +func NewTCPTransportWithLogger( + bindAddr string, + advertise net.Addr, + maxPool int, + timeout time.Duration, + logger *log.Logger, +) (*NetworkTransport, error) { + return newTCPTransport(bindAddr, advertise, func(stream StreamLayer) *NetworkTransport { + return NewNetworkTransportWithLogger(stream, maxPool, timeout, logger) + }) +} + +// NewTCPTransportWithConfig returns a NetworkTransport that is built on top of +// a TCP streaming transport layer, using the given config struct. +func NewTCPTransportWithConfig( + bindAddr string, + advertise net.Addr, + config *NetworkTransportConfig, +) (*NetworkTransport, error) { + return newTCPTransport(bindAddr, advertise, func(stream StreamLayer) *NetworkTransport { + config.Stream = stream + return NewNetworkTransportWithConfig(config) + }) +} + +func newTCPTransport(bindAddr string, + advertise net.Addr, + transportCreator func(stream StreamLayer) *NetworkTransport) (*NetworkTransport, error) { + // Try to bind + list, err := net.Listen("tcp", bindAddr) + if err != nil { + return nil, err + } + + // Create stream + stream := &TCPStreamLayer{ + advertise: advertise, + listener: list.(*net.TCPListener), + } + + // Verify that we have a usable advertise address + addr, ok := stream.Addr().(*net.TCPAddr) + if !ok { + list.Close() + return nil, errNotTCP + } + if addr.IP.IsUnspecified() { + list.Close() + return nil, errNotAdvertisable + } + + // Create the network transport + trans := transportCreator(stream) + return trans, nil +} + +// Dial implements the StreamLayer interface. +func (t *TCPStreamLayer) Dial(address ServerAddress, timeout time.Duration) (net.Conn, error) { + return net.DialTimeout("tcp", string(address), timeout) +} + +// Accept implements the net.Listener interface. +func (t *TCPStreamLayer) Accept() (c net.Conn, err error) { + return t.listener.Accept() +} + +// Close implements the net.Listener interface. +func (t *TCPStreamLayer) Close() (err error) { + return t.listener.Close() +} + +// Addr implements the net.Listener interface. +func (t *TCPStreamLayer) Addr() net.Addr { + // Use an advertise addr if provided + if t.advertise != nil { + return t.advertise + } + return t.listener.Addr() +} diff --git a/vendor/github.com/hashicorp/raft/transport.go b/vendor/github.com/hashicorp/raft/transport.go new file mode 100644 index 0000000000..85459b221d --- /dev/null +++ b/vendor/github.com/hashicorp/raft/transport.go @@ -0,0 +1,124 @@ +package raft + +import ( + "io" + "time" +) + +// RPCResponse captures both a response and a potential error. +type RPCResponse struct { + Response interface{} + Error error +} + +// RPC has a command, and provides a response mechanism. +type RPC struct { + Command interface{} + Reader io.Reader // Set only for InstallSnapshot + RespChan chan<- RPCResponse +} + +// Respond is used to respond with a response, error or both +func (r *RPC) Respond(resp interface{}, err error) { + r.RespChan <- RPCResponse{resp, err} +} + +// Transport provides an interface for network transports +// to allow Raft to communicate with other nodes. +type Transport interface { + // Consumer returns a channel that can be used to + // consume and respond to RPC requests. + Consumer() <-chan RPC + + // LocalAddr is used to return our local address to distinguish from our peers. + LocalAddr() ServerAddress + + // AppendEntriesPipeline returns an interface that can be used to pipeline + // AppendEntries requests. + AppendEntriesPipeline(id ServerID, target ServerAddress) (AppendPipeline, error) + + // AppendEntries sends the appropriate RPC to the target node. + AppendEntries(id ServerID, target ServerAddress, args *AppendEntriesRequest, resp *AppendEntriesResponse) error + + // RequestVote sends the appropriate RPC to the target node. + RequestVote(id ServerID, target ServerAddress, args *RequestVoteRequest, resp *RequestVoteResponse) error + + // InstallSnapshot is used to push a snapshot down to a follower. The data is read from + // the ReadCloser and streamed to the client. + InstallSnapshot(id ServerID, target ServerAddress, args *InstallSnapshotRequest, resp *InstallSnapshotResponse, data io.Reader) error + + // EncodePeer is used to serialize a peer's address. + EncodePeer(id ServerID, addr ServerAddress) []byte + + // DecodePeer is used to deserialize a peer's address. + DecodePeer([]byte) ServerAddress + + // SetHeartbeatHandler is used to setup a heartbeat handler + // as a fast-pass. This is to avoid head-of-line blocking from + // disk IO. If a Transport does not support this, it can simply + // ignore the call, and push the heartbeat onto the Consumer channel. + SetHeartbeatHandler(cb func(rpc RPC)) +} + +// WithClose is an interface that a transport may provide which +// allows a transport to be shut down cleanly when a Raft instance +// shuts down. +// +// It is defined separately from Transport as unfortunately it wasn't in the +// original interface specification. +type WithClose interface { + // Close permanently closes a transport, stopping + // any associated goroutines and freeing other resources. + Close() error +} + +// LoopbackTransport is an interface that provides a loopback transport suitable for testing +// e.g. InmemTransport. It's there so we don't have to rewrite tests. +type LoopbackTransport interface { + Transport // Embedded transport reference + WithPeers // Embedded peer management + WithClose // with a close routine +} + +// WithPeers is an interface that a transport may provide which allows for connection and +// disconnection. Unless the transport is a loopback transport, the transport specified to +// "Connect" is likely to be nil. +type WithPeers interface { + Connect(peer ServerAddress, t Transport) // Connect a peer + Disconnect(peer ServerAddress) // Disconnect a given peer + DisconnectAll() // Disconnect all peers, possibly to reconnect them later +} + +// AppendPipeline is used for pipelining AppendEntries requests. It is used +// to increase the replication throughput by masking latency and better +// utilizing bandwidth. +type AppendPipeline interface { + // AppendEntries is used to add another request to the pipeline. + // The send may block which is an effective form of back-pressure. + AppendEntries(args *AppendEntriesRequest, resp *AppendEntriesResponse) (AppendFuture, error) + + // Consumer returns a channel that can be used to consume + // response futures when they are ready. + Consumer() <-chan AppendFuture + + // Close closes the pipeline and cancels all inflight RPCs + Close() error +} + +// AppendFuture is used to return information about a pipelined AppendEntries request. +type AppendFuture interface { + Future + + // Start returns the time that the append request was started. + // It is always OK to call this method. + Start() time.Time + + // Request holds the parameters of the AppendEntries call. + // It is always OK to call this method. + Request() *AppendEntriesRequest + + // Response holds the results of the AppendEntries call. + // This method must only be called after the Error + // method returns, and will only be valid on success. + Response() *AppendEntriesResponse +} diff --git a/vendor/github.com/hashicorp/raft/util.go b/vendor/github.com/hashicorp/raft/util.go new file mode 100644 index 0000000000..90428d7437 --- /dev/null +++ b/vendor/github.com/hashicorp/raft/util.go @@ -0,0 +1,133 @@ +package raft + +import ( + "bytes" + crand "crypto/rand" + "fmt" + "math" + "math/big" + "math/rand" + "time" + + "github.com/hashicorp/go-msgpack/codec" +) + +func init() { + // Ensure we use a high-entropy seed for the psuedo-random generator + rand.Seed(newSeed()) +} + +// returns an int64 from a crypto random source +// can be used to seed a source for a math/rand. +func newSeed() int64 { + r, err := crand.Int(crand.Reader, big.NewInt(math.MaxInt64)) + if err != nil { + panic(fmt.Errorf("failed to read random bytes: %v", err)) + } + return r.Int64() +} + +// randomTimeout returns a value that is between the minVal and 2x minVal. +func randomTimeout(minVal time.Duration) <-chan time.Time { + if minVal == 0 { + return nil + } + extra := (time.Duration(rand.Int63()) % minVal) + return time.After(minVal + extra) +} + +// min returns the minimum. +func min(a, b uint64) uint64 { + if a <= b { + return a + } + return b +} + +// max returns the maximum. +func max(a, b uint64) uint64 { + if a >= b { + return a + } + return b +} + +// generateUUID is used to generate a random UUID. +func generateUUID() string { + buf := make([]byte, 16) + if _, err := crand.Read(buf); err != nil { + panic(fmt.Errorf("failed to read random bytes: %v", err)) + } + + return fmt.Sprintf("%08x-%04x-%04x-%04x-%12x", + buf[0:4], + buf[4:6], + buf[6:8], + buf[8:10], + buf[10:16]) +} + +// asyncNotifyCh is used to do an async channel send +// to a single channel without blocking. +func asyncNotifyCh(ch chan struct{}) { + select { + case ch <- struct{}{}: + default: + } +} + +// drainNotifyCh empties out a single-item notification channel without +// blocking, and returns whether it received anything. +func drainNotifyCh(ch chan struct{}) bool { + select { + case <-ch: + return true + default: + return false + } +} + +// asyncNotifyBool is used to do an async notification +// on a bool channel. +func asyncNotifyBool(ch chan bool, v bool) { + select { + case ch <- v: + default: + } +} + +// Decode reverses the encode operation on a byte slice input. +func decodeMsgPack(buf []byte, out interface{}) error { + r := bytes.NewBuffer(buf) + hd := codec.MsgpackHandle{} + dec := codec.NewDecoder(r, &hd) + return dec.Decode(out) +} + +// Encode writes an encoded object to a new bytes buffer. +func encodeMsgPack(in interface{}) (*bytes.Buffer, error) { + buf := bytes.NewBuffer(nil) + hd := codec.MsgpackHandle{} + enc := codec.NewEncoder(buf, &hd) + err := enc.Encode(in) + return buf, err +} + +// backoff is used to compute an exponential backoff +// duration. Base time is scaled by the current round, +// up to some maximum scale factor. +func backoff(base time.Duration, round, limit uint64) time.Duration { + power := min(round, limit) + for power > 2 { + base *= 2 + power-- + } + return base +} + +// Needed for sorting []uint64, used to determine commitment +type uint64Slice []uint64 + +func (p uint64Slice) Len() int { return len(p) } +func (p uint64Slice) Less(i, j int) bool { return p[i] < p[j] } +func (p uint64Slice) Swap(i, j int) { p[i], p[j] = p[j], p[i] } diff --git a/vendor/vendor.json b/vendor/vendor.json new file mode 100644 index 0000000000..42f68b2267 --- /dev/null +++ b/vendor/vendor.json @@ -0,0 +1,45 @@ +{ + "comment": "", + "ignore": "test", + "package": [ + { + "checksumSHA1": "HF3V9ieTLnqjlDcqyGmHxYojZXE=", + "path": "github.com/CanonicalLtd/go-dqlite", + "revision": "3eab944668d7af5d0fc69ddb387ffda76300541c", + "revisionTime": "2019-03-22T09:57:25Z", + "tree": true + }, + { + "checksumSHA1": "5UAXxv+O1Oxx8kQAUvR94zCVy+Q=", + "path": "github.com/CanonicalLtd/raft-http", + "revision": "4c2dd679d3b46c11b250d63ae43467d4c4ab0962", + "revisionTime": "2018-04-14T15:56:53Z" + }, + { + "checksumSHA1": "nflIYP3tDRTgp2g4I1qoK8fDgmc=", + "path": "github.com/CanonicalLtd/raft-membership", + "revision": "3846634b0164affd0b3dfba1fdd7f9da6387e501", + "revisionTime": "2018-04-13T13:33:40Z" + }, + { + "checksumSHA1": "nbblYWwQstB9B+OhB1zoDFLhYWQ=", + "path": "github.com/CanonicalLtd/raft-test", + "revision": "586f073e84d2c7bbf01340756979db76179c7a7a", + "revisionTime": "2019-04-30T22:51:17Z", + "tree": true + }, + { + "checksumSHA1": "RMI9XuADcv+6w3jS5FpqzjDKuhI=", + "path": "github.com/hashicorp/raft", + "revision": "2c551690b5c0eb05ef7f4ad72ed01f7f6ce3fcb6", + "revisionTime": "2019-05-11T03:54:14Z" + }, + { + "checksumSHA1": "Y2PM65le0fGtiD12RaKknBscFys=", + "path": "github.com/hashicorp/raft-boltdb", + "revision": "6e5ba93211eaf8d9a2ad7e41ffad8c6f160f9fe3", + "revisionTime": "2017-10-10T15:18:10Z" + } + ], + "rootPath": "github.com/lxc/lxd" +} From lxc-bot at linuxcontainers.org Tue May 21 19:18:56 2019 From: lxc-bot at linuxcontainers.org (freeekanayaka on Github) Date: Tue, 21 May 2019 12:18:56 -0700 (PDT) Subject: [lxc-devel] [lxd/master] Trigger the upgrade script if we detect a dqlite client with higher version [WIP] Message-ID: <5ce44f20.1c69fb81.e5b8f.8f13SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 363 bytes Desc: not available URL: -------------- next part -------------- From 4bf0346d36c38295488da59813466d285c7664a7 Mon Sep 17 00:00:00 2001 From: Free Ekanayaka Date: Tue, 21 May 2019 13:48:24 +0200 Subject: [PATCH] Trigger the upgrade script if we detect a dqlite client with higher version Signed-off-by: Free Ekanayaka --- lxd/cluster/gateway.go | 30 ++++++++++++++++++++++++++++++ lxd/cluster/upgrade.go | 11 ++++++++--- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/lxd/cluster/gateway.go b/lxd/cluster/gateway.go index c34fdd1c0a..29d3337752 100644 --- a/lxd/cluster/gateway.go +++ b/lxd/cluster/gateway.go @@ -98,12 +98,19 @@ type Gateway struct { // their version. upgradeCh chan struct{} + // Used to track whether we already triggered an upgrade because we + // detected a peer with an higher version. + upgradeTriggered bool + // ServerStore wrapper. store *dqliteServerStore lock sync.RWMutex } +// Current dqlite protocol version. +const dqliteVersion = 0 + // HandlerFuncs returns the HTTP handlers that should be added to the REST API // endpoint in order to handle database-related requests. // @@ -123,6 +130,29 @@ func (g *Gateway) HandlerFuncs() map[string]http.HandlerFunc { return } + // Compare the dqlite version of the connecting client + // with our own one. + versionHeader := r.Header.Get("X-Dqlite-Version") + if versionHeader == "" { + // No version header means an old pre dqlite 1.0 client. + versionHeader = "0" + } + version, err := strconv.Atoi(versionHeader) + if err != nil { + http.Error(w, "400 invalid dqlite version", http.StatusBadRequest) + return + } + if version != dqliteVersion { + if !g.upgradeTriggered && version > dqliteVersion { + err = triggerUpdate() + if err == nil { + g.upgradeTriggered = true + } + } + http.Error(w, "503 dqlite version mismatch", http.StatusServiceUnavailable) + return + } + // Handle heatbeats. if r.Method == "PUT" { var nodes []db.RaftNode diff --git a/lxd/cluster/upgrade.go b/lxd/cluster/upgrade.go index c4165343e4..df649da997 100644 --- a/lxd/cluster/upgrade.go +++ b/lxd/cluster/upgrade.go @@ -110,19 +110,24 @@ func maybeUpdate(state *state.State) { return } + triggerUpdate() +} + +func triggerUpdate() error { logger.Infof("Node is out-of-date with respect to other cluster nodes") updateExecutable := os.Getenv("LXD_CLUSTER_UPDATE") if updateExecutable == "" { logger.Debug("No LXD_CLUSTER_UPDATE variable set, skipping auto-update") - return + return nil } logger.Infof("Triggering cluster update using: %s", updateExecutable) - _, err = shared.RunCommand(updateExecutable) + _, err := shared.RunCommand(updateExecutable) if err != nil { logger.Errorf("Cluster upgrade failed: '%v'", err.Error()) - return + return err } + return nil } From lxc-bot at linuxcontainers.org Wed May 22 16:52:59 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Wed, 22 May 2019 09:52:59 -0700 (PDT) Subject: [lxc-devel] [lxd/master] network: p2p/bridged static route consistency updates Message-ID: <5ce57e6b.1c69fb81.e1e8.92bfSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 620 bytes Desc: not available URL: -------------- next part -------------- From e3ff176bc99a5c326740af263986820a0bc55295 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Wed, 22 May 2019 17:26:39 +0100 Subject: [PATCH 1/4] test: Adds further p2p nic tests for various scenarios Signed-off-by: Thomas Parrott --- test/suites/container_devices_nic_p2p.sh | 137 +++++++++++++++++++++++ 1 file changed, 137 insertions(+) diff --git a/test/suites/container_devices_nic_p2p.sh b/test/suites/container_devices_nic_p2p.sh index 4c1c8deed0..a3f8b12b7e 100644 --- a/test/suites/container_devices_nic_p2p.sh +++ b/test/suites/container_devices_nic_p2p.sh @@ -190,6 +190,143 @@ test_container_devices_nic_p2p() { lxc launch testimage "${ctName}" lxc config device add "${ctName}" eth0 nic \ nictype=p2p + + # Now add some routes + lxc config device set "${ctName}" eth0 ipv4.routes "192.0.2.2${ipRand}/32" + lxc config device set "${ctName}" eth0 ipv6.routes "2001:db8::2${ipRand}/128" + + # Check routes are applied on update. The host name is dynamic, so just check routes exist. + if ! ip -4 r list | grep "192.0.2.2${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ! ip -6 r list | grep "2001:db8::2${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Now update routes, check old routes go and new routes added. + lxc config device set "${ctName}" eth0 ipv4.routes "192.0.2.3${ipRand}/32" + lxc config device set "${ctName}" eth0 ipv6.routes "2001:db8::3${ipRand}/128" + + # Check routes are applied on update. The host name is dynamic, so just check routes exist. + if ! ip -4 r list | grep "192.0.2.3${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ! ip -6 r list | grep "2001:db8::3${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Check old routes removed + if ip -4 r list | grep "192.0.2.2${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ip -6 r list | grep "2001:db8::2${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Now remove device, check routes go lxc config device remove "${ctName}" eth0 + + if ip -4 r list | grep "192.0.2.3${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ip -6 r list | grep "2001:db8::3${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Now add a nic to a stopped container with routes. + lxc stop "${ctName}" + lxc config device add "${ctName}" eth0 nic \ + nictype=p2p \ + ipv4.routes="192.0.2.2${ipRand}/32" \ + ipv6.routes="2001:db8::2${ipRand}/128" + + lxc start "${ctName}" + + # Check routes are applied on start. The host name is dynamic, so just check routes exist. + if ! ip -4 r list | grep "192.0.2.2${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ! ip -6 r list | grep "2001:db8::2${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Now update routes on boot time nic, check old routes go and new routes added. + lxc config device set "${ctName}" eth0 ipv4.routes "192.0.2.3${ipRand}/32" + lxc config device set "${ctName}" eth0 ipv6.routes "2001:db8::3${ipRand}/128" + + # Check routes are applied on update. The host name is dynamic, so just check routes exist. + if ! ip -4 r list | grep "192.0.2.3${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ! ip -6 r list | grep "2001:db8::3${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Check old routes removed + if ip -4 r list | grep "192.0.2.2${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ip -6 r list | grep "2001:db8::2${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Now remove boot time device + lxc config device remove "${ctName}" eth0 + + # Check old routes removed + if ip -4 r list | grep "192.0.2.3${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ip -6 r list | grep "2001:db8::3${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Add hot plug device with routes. + lxc config device add "${ctName}" eth0 nic \ + nictype=p2p + + # Now update routes on hotplug nic + lxc config device set "${ctName}" eth0 ipv4.routes "192.0.2.2${ipRand}/32" + lxc config device set "${ctName}" eth0 ipv6.routes "2001:db8::2${ipRand}/128" + + # Check routes are applied. The host name is dynamic, so just check routes exist. + if ! ip -4 r list | grep "192.0.2.2${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ! ip -6 r list | grep "2001:db8::2${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + + # Now remove hotplug device + lxc config device remove "${ctName}" eth0 + + # Check old routes removed + if ip -4 r list | grep "192.0.2.2${ipRand}" ; then + echo "ipv4.routes invalid" + false + fi + if ip -6 r list | grep "2001:db8::2${ipRand}" ; then + echo "ipv6.routes invalid" + false + fi + lxc delete "${ctName}" -f } From 96f32106de0f126f6cc8a77522ed5d51b0c8ba1b Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Wed, 22 May 2019 17:28:55 +0100 Subject: [PATCH 2/4] container/lxc: Runs network up hook for all p2p and bridged nics This is so that host_name info can be recorded consistently on boot. Signed-off-by: Thomas Parrott --- lxd/container_lxc.go | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index f02567325d..8447283230 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -1678,8 +1678,8 @@ func (c *containerLXC) initLXC(config bool) error { } } - // Check if the container has network specific keys set to avoid unnecessarily running the network up hook. - if shared.StringMapHasStringKey(m, containerNetworkKeys...) && shared.StringInSlice(m["nictype"], []string{"bridged", "p2p"}) { + // Run network up hook for bridged and p2p nics. + if shared.StringInSlice(m["nictype"], []string{"bridged", "p2p"}) { err = lxcSetConfigItem(cc, fmt.Sprintf("%s.%d.script.up", networkKeyPrefix, networkidx), fmt.Sprintf("%s callhook %s %d network-up %s", c.state.OS.ExecPath, shared.VarPath(""), c.id, k)) if err != nil { return err From eb7c7e26370dd25c9a33597a32fd38fdd495a536 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Wed, 22 May 2019 17:30:29 +0100 Subject: [PATCH 3/4] container/lxc: Records host_name from LXC on p2p/bridged nic start Records host_name for p2p/bridged nic start in volatile data and updates routes and limits settings to use them. This allows consistent boot/add/remove/update fof p2p/bridged settings even on older kernels. Signed-off-by: Thomas Parrott --- lxd/container_lxc.go | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 8447283230..b78669841f 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -3173,7 +3173,7 @@ func (c *containerLXC) cleanupNetworkRoutes() error { // Remove any static veth routes if shared.StringInSlice(m["nictype"], []string{"bridged", "p2p"}) { - c.removeNetworkRoutes(m) + c.removeNetworkRoutes(k, m) } } @@ -3184,7 +3184,23 @@ func (c *containerLXC) cleanupNetworkRoutes() error { // OnNetworkUp is called by the LXD callhook when the LXC network up script is run. func (c *containerLXC) OnNetworkUp(deviceName string, hostName string) error { device := c.expandedDevices[deviceName] - device["host_name"] = hostName + + // This hook is only for bridged and p2p nics currently. + if !shared.StringInSlice(device["nictype"], []string{"bridged", "p2p"}) { + return nil + } + + // Record boot time host name of nic into volatile for use with routes/limits updates later. + // Only need to do this if host_name is not specified in nic config. + if device["host_name"] == "" { + device["host_name"] = hostName + hostNameKey := fmt.Sprintf("volatile.%s.host_name", deviceName) + err := c.VolatileSet(map[string]string{hostNameKey: hostName}) + if err != nil { + return err + } + } + return c.setupHostVethDevice(deviceName, device, types.Device{}) } @@ -3192,8 +3208,8 @@ func (c *containerLXC) OnNetworkUp(deviceName string, hostName string) error { func (c *containerLXC) setupHostVethDevice(deviceName string, device types.Device, oldDevice types.Device) error { // If not populated already, check if volatile data contains the most recently added host_name. if device["host_name"] == "" { - configKey := fmt.Sprintf("volatile.%s.host_name", deviceName) - device["host_name"] = c.localConfig[configKey] + hostNameKey := fmt.Sprintf("volatile.%s.host_name", deviceName) + device["host_name"] = c.localConfig[hostNameKey] } // Check whether host device resolution succeeded. @@ -3208,7 +3224,7 @@ func (c *containerLXC) setupHostVethDevice(deviceName string, device types.Devic } // Setup static routes to container - err = c.setNetworkRoutes(device, oldDevice) + err = c.setNetworkRoutes(deviceName, device, oldDevice) if err != nil { return err } @@ -8299,7 +8315,7 @@ func (c *containerLXC) removeNetworkDevice(name string, m types.Device) error { // Remove any static veth routes if shared.StringInSlice(m["nictype"], []string{"bridged", "p2p"}) { - c.removeNetworkRoutes(m) + c.removeNetworkRoutes(name, m) } // If a veth, destroy it @@ -8852,13 +8868,13 @@ func (c *containerLXC) getHostInterface(name string) string { } // setNetworkRoutes applies any static routes configured from the host to the container nic. -func (c *containerLXC) setNetworkRoutes(m types.Device, oldDevice types.Device) error { +func (c *containerLXC) setNetworkRoutes(deviceName string, m types.Device, oldDevice types.Device) error { if !shared.PathExists(fmt.Sprintf("/sys/class/net/%s", m["host_name"])) { return fmt.Errorf("Unknown or missing host side veth: %s", m["host_name"]) } // Remove any old routes that were setup for this nic device. - c.removeNetworkRoutes(oldDevice) + c.removeNetworkRoutes(deviceName, oldDevice) // Decide whether the route should point to the veth parent or the bridge parent routeDev := m["host_name"] @@ -8893,7 +8909,13 @@ func (c *containerLXC) setNetworkRoutes(m types.Device, oldDevice types.Device) // removeNetworkRoutes removes any routes created for this device on the host that were first added // with setNetworkRoutes(). Expects to be passed the device config from the oldExpandedDevices. -func (c *containerLXC) removeNetworkRoutes(m types.Device) { +func (c *containerLXC) removeNetworkRoutes(deviceName string, m types.Device) { + // If not populated already, check if volatile data contains the most recently added host_name. + if m["host_name"] == "" { + hostNameKey := fmt.Sprintf("volatile.%s.host_name", deviceName) + m["host_name"] = c.localConfig[hostNameKey] + } + // Decide whether the route should point to the veth parent or the bridge parent routeDev := m["host_name"] if m["nictype"] == "bridged" { From 7ab1162fb8cfef3a7d3241696ee0f806f20ba656 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Wed, 22 May 2019 17:32:05 +0100 Subject: [PATCH 4/4] lxc/container: Removes unused getHostInterface() Signed-off-by: Thomas Parrott --- lxd/container_lxc.go | 50 -------------------------------------------- 1 file changed, 50 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index b78669841f..93e255d162 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -8817,56 +8817,6 @@ func (c *containerLXC) setNetworkPriority() error { return nil } -func (c *containerLXC) getHostInterface(name string) string { - // Pull directly from kernel - networks := c.networkState() - if networks[name].HostName != "" { - return networks[name].HostName - } - - // Fallback to poking LXC - if c.IsRunning() { - networkKeyPrefix := "lxc.net" - if !util.RuntimeLiblxcVersionAtLeast(2, 1, 0) { - networkKeyPrefix = "lxc.network" - } - - for i := 0; i < len(c.c.ConfigItem(networkKeyPrefix)); i++ { - nicName := c.c.RunningConfigItem(fmt.Sprintf("%s.%d.name", networkKeyPrefix, i))[0] - if nicName != name { - continue - } - - veth := c.c.RunningConfigItem(fmt.Sprintf("%s.%d.veth.pair", networkKeyPrefix, i))[0] - if veth != "" { - return veth - } - } - } - - // Fallback to parsing LXD config - for _, k := range c.expandedDevices.DeviceNames() { - dev := c.expandedDevices[k] - if dev["type"] != "nic" && dev["type"] != "infiniband" { - continue - } - - m, err := c.fillNetworkDevice(k, dev) - if err != nil { - m = dev - } - - if m["name"] != name { - continue - } - - return m["host_name"] - } - - // Fail - return "" -} - // setNetworkRoutes applies any static routes configured from the host to the container nic. func (c *containerLXC) setNetworkRoutes(deviceName string, m types.Device, oldDevice types.Device) error { if !shared.PathExists(fmt.Sprintf("/sys/class/net/%s", m["host_name"])) { From lxc-bot at linuxcontainers.org Wed May 22 19:07:25 2019 From: lxc-bot at linuxcontainers.org (tomponline on Github) Date: Wed, 22 May 2019 12:07:25 -0700 (PDT) Subject: [lxc-devel] [lxd/master] container/lxc: Moves volatile host_name enrichment to fillNetworkDevice Message-ID: <5ce59ded.1c69fb81.7c451.97cfSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 422 bytes Desc: not available URL: -------------- next part -------------- From b27eeadb7df294928441afaf00a41f14ef7aed78 Mon Sep 17 00:00:00 2001 From: Thomas Parrott Date: Wed, 22 May 2019 19:02:45 +0100 Subject: [PATCH] container/lxc: Moves volatile host_name enrichment into fillNetworkDevice Also clears volatile host_name keys when container stops. Signed-off-by: Thomas Parrott --- lxd/container_lxc.go | 51 +++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 20 deletions(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index 93e255d162..305500f62d 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -3102,10 +3102,10 @@ func (c *containerLXC) OnStop(target string) error { logger.Error("Failed to set container state", log.Ctx{"container": c.Name(), "err": err}) } - // Clean up networking routes - err = c.cleanupNetworkRoutes() + // Clean up networking veth devices + err = c.cleanupHostVethDevices() if err != nil { - logger.Error("Failed to cleanup network routes: ", log.Ctx{"container": c.Name(), "err": err}) + logger.Error("Failed to cleanup veth devices: ", log.Ctx{"container": c.Name(), "err": err}) } go func(c *containerLXC, target string, op *lxcContainerOperation) { @@ -3163,19 +3163,38 @@ func (c *containerLXC) OnStop(target string) error { return nil } -// cleanupNetworkRoutes removes any static routes added on the host for nic devices. -func (c *containerLXC) cleanupNetworkRoutes() error { +// cleanupHostVethDevices removes host side configuration for veth devices. +func (c *containerLXC) cleanupHostVethDevices() error { + volatileNics := make([]string, 0) + for _, k := range c.expandedDevices.DeviceNames() { m := c.expandedDevices[k] if m["type"] != "nic" { continue } - // Remove any static veth routes + m, err := c.fillNetworkDevice(k, m) + if err != nil { + continue + } + + // Remove any static host side veth routes if shared.StringInSlice(m["nictype"], []string{"bridged", "p2p"}) { c.removeNetworkRoutes(k, m) + volatileNics = append(volatileNics, k) // Record for volatile removal } + } + + // Clear host side config from volatile nics + volatile := make(map[string]string) + for _, deviceName := range volatileNics { + hostNameKey := fmt.Sprintf("volatile.%s.host_name", deviceName) + volatile[hostNameKey] = "" // Remove volatile host_name for device + } + err := c.VolatileSet(volatile) + if err != nil { + return err } return nil @@ -3206,12 +3225,6 @@ func (c *containerLXC) OnNetworkUp(deviceName string, hostName string) error { // setupHostVethDevice configures a nic device's host side veth settings. func (c *containerLXC) setupHostVethDevice(deviceName string, device types.Device, oldDevice types.Device) error { - // If not populated already, check if volatile data contains the most recently added host_name. - if device["host_name"] == "" { - hostNameKey := fmt.Sprintf("volatile.%s.host_name", deviceName) - device["host_name"] = c.localConfig[hostNameKey] - } - // Check whether host device resolution succeeded. if device["host_name"] == "" { return fmt.Errorf("Failed to find host side veth name for device \"%s\"", deviceName) @@ -5103,7 +5116,11 @@ func (c *containerLXC) Update(args db.ContainerArgs, userRequested bool) error { return err } - err = c.setupHostVethDevice(k, m, oldExpandedDevices[k]) + // We're updating the same device, so copy enriched host_name + // into oldDevice config for veth host device setup. + oldDevice := oldExpandedDevices[k] + oldDevice["host_name"] = m["host_name"] + err = c.setupHostVethDevice(k, m, oldDevice) if err != nil { return err } @@ -8135,7 +8152,7 @@ func (c *containerLXC) fillNetworkDevice(name string, m types.Device) (types.Dev } // Fill in the host name (but don't generate a static one ourselves) - if m["host_name"] == "" && shared.StringInSlice(m["nictype"], []string{"sriov"}) { + if m["host_name"] == "" && shared.StringInSlice(m["nictype"], []string{"bridged", "p2p", "sriov"}) { configKey := fmt.Sprintf("volatile.%s.host_name", name) newDevice["host_name"] = c.localConfig[configKey] } @@ -8860,12 +8877,6 @@ func (c *containerLXC) setNetworkRoutes(deviceName string, m types.Device, oldDe // removeNetworkRoutes removes any routes created for this device on the host that were first added // with setNetworkRoutes(). Expects to be passed the device config from the oldExpandedDevices. func (c *containerLXC) removeNetworkRoutes(deviceName string, m types.Device) { - // If not populated already, check if volatile data contains the most recently added host_name. - if m["host_name"] == "" { - hostNameKey := fmt.Sprintf("volatile.%s.host_name", deviceName) - m["host_name"] = c.localConfig[hostNameKey] - } - // Decide whether the route should point to the veth parent or the bridge parent routeDev := m["host_name"] if m["nictype"] == "bridged" { From lxc-bot at linuxcontainers.org Wed May 22 19:21:11 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Wed, 22 May 2019 12:21:11 -0700 (PDT) Subject: [lxc-devel] [lxd/master] scripts: Add script to completely reset LXD Message-ID: <5ce5a127.1c69fb81.acbef.fd5dSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 370 bytes Desc: not available URL: -------------- next part -------------- From 1d8ce974d3da18ac1c2a04eebd5fe7f0ac4cbc7e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 22 May 2019 14:41:56 -0400 Subject: [PATCH] scripts: Add script to completely reset LXD MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #5782 Signed-off-by: Stéphane Graber --- scripts/empty-lxd.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100755 scripts/empty-lxd.sh diff --git a/scripts/empty-lxd.sh b/scripts/empty-lxd.sh new file mode 100755 index 0000000000..a979db8f52 --- /dev/null +++ b/scripts/empty-lxd.sh @@ -0,0 +1,52 @@ +#!/bin/sh -eu +if ! which jq >/dev/null 2>&1; then + echo "This tool requires: jq" + exit 1 +fi + +## Delete anything that's tied to a project +for project in $(lxc query "/1.0/projects?recursion=1" | jq .[].name -r); do + echo "==> Deleting all containers for project: ${project}" + for container in $(lxc query "/1.0/containers?recursion=1&project=${project}" | jq .[].name -r); do + lxc delete --project "${project}" -f "${container}" + done + + echo "==> Deleting all images for project: ${project}" + for image in $(lxc query "/1.0/images?recursion=1&project=${project}" | jq .[].fingerprint -r); do + lxc image delete --project "${project}" "${image}" + done +done + +for project in $(lxc query "/1.0/projects?recursion=1" | jq .[].name -r); do + echo "==> Deleting all profiles for project: ${project}" + for profile in $(lxc query "/1.0/profiles?recursion=1&project=${project}" | jq .[].name -r); do + if [ "${profile}" = "default" ]; then + printf 'config: {}\ndevices: {}' | lxc profile edit --project "${project}" default + continue + fi + lxc profile delete --project "${project}" "${profile}" + done + + if [ "${project}" != "default" ]; then + echo "==> Deleting project: ${project}" + lxc project delete "${project}" + fi +done + +## Delete the networks +echo "==> Deleting all networks" +for network in $(lxc query "/1.0/networks?recursion=1" | jq '.[] | select(.managed) | .name' -r); do + lxc network delete "${network}" +done + +## Delete the storage pools +echo "==> Deleting all storage pools" +for storage_pool in $(lxc query "/1.0/storage-pools?recursion=1" | jq .[].name -r); do + for volume in $(lxc query "/1.0/storage-pools/${storage_pool}/volumes/custom?recursion=1" | jq .[].name -r); do + echo "==> Deleting storage volume ${volume} on ${storage_pool}" + lxc storage volume delete "${storage_pool}" "${volume}" + done + + ## Delete the custom storage volumes + lxc storage delete "${storage_pool}" +done From lxc-bot at linuxcontainers.org Wed May 22 20:25:58 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Wed, 22 May 2019 13:25:58 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/networks: Fix ETag handling on clusters Message-ID: <5ce5b056.1c69fb81.1a2aa.d4b7SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 370 bytes Desc: not available URL: -------------- next part -------------- From 102822dcd405dd9b2e8a32d828dd521c16d15847 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Wed, 22 May 2019 16:15:46 -0400 Subject: [PATCH] lxd/networks: Fix ETag handling on clusters MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #5764 Signed-off-by: Stéphane Graber --- lxd/networks.go | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/lxd/networks.go b/lxd/networks.go index d8545ce7a4..5ce5bfda11 100644 --- a/lxd/networks.go +++ b/lxd/networks.go @@ -582,6 +582,20 @@ func networkPut(d *Daemon, r *http.Request) Response { return SmartError(err) } + targetNode := queryParam(r, "target") + clustered, err := cluster.Enabled(d.db) + if err != nil { + return SmartError(err) + } + + // If no target node is specified and the daemon is clustered, we omit + // the node-specific fields. + if targetNode == "" && clustered { + for _, key := range db.NetworkNodeConfigKeys { + delete(dbInfo.Config, key) + } + } + // Validate the ETag etag := []interface{}{dbInfo.Name, dbInfo.Managed, dbInfo.Type, dbInfo.Description, dbInfo.Config} @@ -607,6 +621,20 @@ func networkPatch(d *Daemon, r *http.Request) Response { return SmartError(err) } + targetNode := queryParam(r, "target") + clustered, err := cluster.Enabled(d.db) + if err != nil { + return SmartError(err) + } + + // If no target node is specified and the daemon is clustered, we omit + // the node-specific fields. + if targetNode == "" && clustered { + for _, key := range db.NetworkNodeConfigKeys { + delete(dbInfo.Config, key) + } + } + // Validate the ETag etag := []interface{}{dbInfo.Name, dbInfo.Managed, dbInfo.Type, dbInfo.Description, dbInfo.Config} From lxc-bot at linuxcontainers.org Thu May 23 18:02:47 2019 From: lxc-bot at linuxcontainers.org (stgraber on Github) Date: Thu, 23 May 2019 11:02:47 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/containers: Fix bad error Message-ID: <5ce6e047.1c69fb81.174ca.255cSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 354 bytes Desc: not available URL: -------------- next part -------------- From 755d602256a6a4c2634c4624d9adb091930f299c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?St=C3=A9phane=20Graber?= Date: Thu, 23 May 2019 14:02:15 -0400 Subject: [PATCH] lxd/containers: Fix bad error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Stéphane Graber --- lxd/container_lxc.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lxd/container_lxc.go b/lxd/container_lxc.go index f46fc24887..cddd69ad51 100644 --- a/lxd/container_lxc.go +++ b/lxd/container_lxc.go @@ -4179,7 +4179,7 @@ func (c *containerLXC) Update(args db.ContainerArgs, userRequested bool) error { // Validate the new profiles profiles, err := c.state.Cluster.Profiles(args.Project) if err != nil { - return errors.Wrap(err, "Failed to get project profiles") + return errors.Wrap(err, "Failed to get profiles") } checkedProfiles := []string{} From lxc-bot at linuxcontainers.org Fri May 24 03:00:40 2019 From: lxc-bot at linuxcontainers.org (joelhockey on Github) Date: Thu, 23 May 2019 20:00:40 -0700 (PDT) Subject: [lxc-devel] [lxd/master] lxd/images fix compressErr return Message-ID: <5ce75e58.1c69fb81.e1ba2.abffSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 353 bytes Desc: not available URL: -------------- next part -------------- From 1aaf40c9990e273c4444d2695cbc9808485b825b Mon Sep 17 00:00:00 2001 From: Joel Hockey Date: Thu, 23 May 2019 19:54:21 -0700 Subject: [PATCH] lxd/images fix compressErr return Signed-off-by: Joel Hockey --- lxd/images.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lxd/images.go b/lxd/images.go index 5753ba5342..8342a9e4d2 100644 --- a/lxd/images.go +++ b/lxd/images.go @@ -274,7 +274,7 @@ func imgPostContInfo(d *Daemon, r *http.Request, req api.ImagesPost, op *operati return nil, err } if compressErr != nil { - return nil, err + return nil, compressErr } imageFile.Close() From lxc-bot at linuxcontainers.org Fri May 24 14:02:13 2019 From: lxc-bot at linuxcontainers.org (brauner on Github) Date: Fri, 24 May 2019 07:02:13 -0700 (PDT) Subject: [lxc-devel] [lxc/master] cgroups: handle offline cpus in v1 hierarchy Message-ID: <5ce7f965.1c69fb81.38c1.eb40SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 577 bytes Desc: not available URL: -------------- next part -------------- From 36f7018103cd66cc16128b04200df15320472a54 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 24 May 2019 15:59:57 +0200 Subject: [PATCH] cgroups: handle offline cpus in v1 hierarchy Handle offline cpus in v1 hierarchy. In addition to isolated cpus we also need to account for offline cpus when our ancestor cgroup is the root cgroup and we have not been initialized yet. Closes #2953. Signed-off-by: Christian Brauner --- src/lxc/cgroups/cgfsng.c | 135 ++++++++++++++++++++++----------------- 1 file changed, 75 insertions(+), 60 deletions(-) diff --git a/src/lxc/cgroups/cgfsng.c b/src/lxc/cgroups/cgfsng.c index 8c3600bb90..5e5995f86c 100644 --- a/src/lxc/cgroups/cgfsng.c +++ b/src/lxc/cgroups/cgfsng.c @@ -378,16 +378,18 @@ static ssize_t get_max_cpus(char *cpulist) } #define __ISOL_CPUS "/sys/devices/system/cpu/isolated" +#define __OFFLINE_CPUS "/sys/devices/system/cpu/offline" static bool cg_legacy_filter_and_set_cpus(char *path, bool am_initialized) { __do_free char *cpulist = NULL, *fpath = NULL, *isolcpus = NULL, - *posscpus = NULL; - __do_free uint32_t *isolmask = NULL, *possmask = NULL; + *offlinecpus = NULL, *posscpus = NULL; + __do_free uint32_t *isolmask = NULL, *offlinemask = NULL, + *possmask = NULL; int ret; ssize_t i; char oldv; char *lastslash; - ssize_t maxisol = 0, maxposs = 0; + ssize_t maxisol = 0, maxoffline = 0, maxposs = 0; bool bret = false, flipped_bit = false; lastslash = strrchr(path, '/'); @@ -409,54 +411,50 @@ static bool cg_legacy_filter_and_set_cpus(char *path, bool am_initialized) if (maxposs < 0 || maxposs >= INT_MAX - 1) return false; - if (!file_exists(__ISOL_CPUS)) { - /* This system doesn't expose isolated cpus. */ - DEBUG("The path \""__ISOL_CPUS"\" to read isolated cpus from does not exist"); - /* No isolated cpus but we weren't already initialized by - * someone. We should simply copy the parents cpuset.cpus - * values. - */ - if (!am_initialized) { - DEBUG("Copying cpu settings of parent cgroup"); - cpulist = posscpus; - goto copy_parent; + if (file_exists(__ISOL_CPUS)) { + isolcpus = read_file(__ISOL_CPUS); + if (!isolcpus) { + SYSERROR("Failed to read file \"%s\"", __ISOL_CPUS); + return false; } - /* No isolated cpus but we were already initialized by someone. - * Nothing more to do for us. - */ - return true; - } - isolcpus = read_file(__ISOL_CPUS); - if (!isolcpus) { - SYSERROR("Failed to read file \""__ISOL_CPUS"\""); - return false; - } - if (!isdigit(isolcpus[0])) { - TRACE("No isolated cpus detected"); - /* No isolated cpus but we weren't already initialized by - * someone. We should simply copy the parents cpuset.cpus - * values. - */ - if (!am_initialized) { - DEBUG("Copying cpu settings of parent cgroup"); - cpulist = posscpus; - goto copy_parent; + if (isdigit(isolcpus[0])) { + /* Get maximum number of cpus found in isolated cpuset. */ + maxisol = get_max_cpus(isolcpus); + if (maxisol < 0 || maxisol >= INT_MAX - 1) + return false; } - /* No isolated cpus but we were already initialized by someone. - * Nothing more to do for us. - */ - return true; + + if (maxposs < maxisol) + maxposs = maxisol; + maxposs++; + } else { + TRACE("The path \""__ISOL_CPUS"\" to read isolated cpus from does not exist"); } - /* Get maximum number of cpus found in isolated cpuset. */ - maxisol = get_max_cpus(isolcpus); - if (maxisol < 0 || maxisol >= INT_MAX - 1) - return false; + if (file_exists(__OFFLINE_CPUS)) { + offlinecpus = read_file(__OFFLINE_CPUS); + if (!offlinecpus) { + SYSERROR("Failed to read file \"%s\"", __OFFLINE_CPUS); + return false; + } - if (maxposs < maxisol) - maxposs = maxisol; - maxposs++; + if (isdigit(offlinecpus[0])) { + /* Get maximum number of cpus found in offline cpuset. */ + maxoffline = get_max_cpus(offlinecpus); + if (maxoffline < 0 || maxoffline >= INT_MAX - 1) + return false; + } + + if (maxposs < maxoffline) + maxposs = maxoffline; + maxposs++; + } else { + TRACE("The path \""__OFFLINE_CPUS"\" to read offline cpus from does not exist"); + } + + if ((maxisol == 0) && (maxoffline == 0)) + goto copy_parent; possmask = lxc_cpumask(posscpus, maxposs); if (!possmask) { @@ -464,14 +462,26 @@ static bool cg_legacy_filter_and_set_cpus(char *path, bool am_initialized) return false; } - isolmask = lxc_cpumask(isolcpus, maxposs); - if (!isolmask) { - ERROR("Failed to create cpumask for isolated cpus"); - return false; + if (maxisol > 0) { + isolmask = lxc_cpumask(isolcpus, maxposs); + if (!isolmask) { + ERROR("Failed to create cpumask for isolated cpus"); + return false; + } + } + + if (maxoffline > 0) { + offlinemask = lxc_cpumask(offlinecpus, maxposs); + if (!offlinemask) { + ERROR("Failed to create cpumask for offline cpus"); + return false; + } } for (i = 0; i <= maxposs; i++) { - if (!is_set(i, isolmask) || !is_set(i, possmask)) + if ((isolmask && !is_set(i, isolmask)) || + (offlinemask && !is_set(i, offlinemask)) || + !is_set(i, possmask)) continue; flipped_bit = true; @@ -479,10 +489,10 @@ static bool cg_legacy_filter_and_set_cpus(char *path, bool am_initialized) } if (!flipped_bit) { - DEBUG("No isolated cpus present in cpuset"); + DEBUG("No isolated or offline cpus present in cpuset"); return true; } - DEBUG("Removed isolated cpus from cpuset"); + DEBUG("Removed isolated or offline cpus from cpuset"); cpulist = lxc_cpumask_to_cpulist(possmask, maxposs); if (!cpulist) { @@ -491,14 +501,19 @@ static bool cg_legacy_filter_and_set_cpus(char *path, bool am_initialized) } copy_parent: - *lastslash = oldv; - fpath = must_make_path(path, "cpuset.cpus", NULL); - ret = lxc_write_to_file(fpath, cpulist, strlen(cpulist), false, 0666); - if (cpulist == posscpus) - cpulist = NULL; - if (ret < 0) { - SYSERROR("Failed to write cpu list to \"%s\"", fpath); - return false; + if (!am_initialized) { + *lastslash = oldv; + fpath = must_make_path(path, "cpuset.cpus", NULL); + ret = lxc_write_to_file(fpath, cpulist, strlen(cpulist), false, + 0666); + if (cpulist == posscpus) + cpulist = NULL; + if (ret < 0) { + SYSERROR("Failed to write cpu list to \"%s\"", fpath); + return false; + } + + TRACE("Copied cpu settings of parent cgroup"); } return true; From lxc-bot at linuxcontainers.org Fri May 24 15:40:44 2019 From: lxc-bot at linuxcontainers.org (CajuM on Github) Date: Fri, 24 May 2019 08:40:44 -0700 (PDT) Subject: [lxc-devel] [go-lxc/v2] Added static build support using -tags static_build Message-ID: <5ce8107c.1c69fb81.bda09.f69dSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 473 bytes Desc: not available URL: -------------- next part -------------- From 44174ee97fd6657e24b4152acc40a923f6b1bbec Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Mihai-Drosi=20C=C3=A2ju?= Date: Fri, 24 May 2019 18:36:36 +0300 Subject: [PATCH] Added static build support using -tags static_build --- linking_dynamic.go | 10 ++++++++++ linking_static.go | 10 ++++++++++ lxc-binding.go | 1 - 3 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 linking_dynamic.go create mode 100644 linking_static.go diff --git a/linking_dynamic.go b/linking_dynamic.go new file mode 100644 index 0000000..1197b81 --- /dev/null +++ b/linking_dynamic.go @@ -0,0 +1,10 @@ +// Copyright © 2013, 2014, The Go-LXC Authors. All rights reserved. +// Use of this source code is governed by a LGPLv2.1 +// license that can be found in the LICENSE file. + +// +build linux,cgo,!static_build + +package lxc + +// #cgo LDFLAGS: -llxc -lutil +import "C" diff --git a/linking_static.go b/linking_static.go new file mode 100644 index 0000000..89f06b4 --- /dev/null +++ b/linking_static.go @@ -0,0 +1,10 @@ +// Copyright © 2013, 2014, The Go-LXC Authors. All rights reserved. +// Use of this source code is governed by a LGPLv2.1 +// license that can be found in the LICENSE file. + +// +build linux,cgo,static_build + +package lxc + +// #cgo LDFLAGS: -static -llxc -lseccomp -lutil -lcap +import "C" diff --git a/lxc-binding.go b/lxc-binding.go index a0e2bdc..61462a1 100644 --- a/lxc-binding.go +++ b/lxc-binding.go @@ -7,7 +7,6 @@ package lxc // #cgo pkg-config: lxc -// #cgo LDFLAGS: -llxc -lutil // #include // #include // #include "lxc-binding.h" From lxc-bot at linuxcontainers.org Fri May 24 21:48:19 2019 From: lxc-bot at linuxcontainers.org (Re4son on Github) Date: Fri, 24 May 2019 14:48:19 -0700 (PDT) Subject: [lxc-devel] [lxc-templates/master] Add kali-linux distro Message-ID: <5ce866a3.1c69fb81.41611.3786SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 348 bytes Desc: not available URL: -------------- next part -------------- From 1f6af7d5959ca1f63ff27c25e2d6669c4048e481 Mon Sep 17 00:00:00 2001 From: Re4son Date: Thu, 23 May 2019 21:07:11 +1000 Subject: [PATCH] Add kali-linux distro Signed-off-by: Re4son --- config/kali.common.conf.in | 28 ++ config/kali.userns.conf.in | 2 + templates/lxc-kali.in | 765 +++++++++++++++++++++++++++++++++++++ 3 files changed, 795 insertions(+) create mode 100644 config/kali.common.conf.in create mode 100644 config/kali.userns.conf.in create mode 100644 templates/lxc-kali.in diff --git a/config/kali.common.conf.in b/config/kali.common.conf.in new file mode 100644 index 0000000..4e6a6e6 --- /dev/null +++ b/config/kali.common.conf.in @@ -0,0 +1,28 @@ +# This derives from the global common config +lxc.include = @LXCTEMPLATECONFIG@/common.conf + +# Doesn't support consoles in /dev/lxc/ +lxc.tty.dir = + +# When using LXC with apparmor, the container will be confined by default. +# If you wish for it to instead run unconfined, copy the following line +# (uncommented) to the container's configuration file. +#lxc.apparmor.profile = unconfined + +# If you wish to allow mounting block filesystems, then use the following +# line instead, and make sure to grant access to the block device and/or loop +# devices below in lxc.cgroup.devices.allow. +#lxc.apparmor.profile = lxc-container-default-with-mounting + +# Extra cgroup device access +## rtc +lxc.cgroup.devices.allow = c 254:0 rm +## tun +lxc.cgroup.devices.allow = c 10:200 rwm +## hpet +lxc.cgroup.devices.allow = c 10:228 rwm +## kvm +lxc.cgroup.devices.allow = c 10:232 rwm +## To use loop devices, copy the following line to the container's +## configuration file (uncommented). +#lxc.cgroup.devices.allow = b 7:* rwm diff --git a/config/kali.userns.conf.in b/config/kali.userns.conf.in new file mode 100644 index 0000000..707bb30 --- /dev/null +++ b/config/kali.userns.conf.in @@ -0,0 +1,2 @@ +# This derives from the global userns config +lxc.include = @LXCTEMPLATECONFIG@/userns.conf diff --git a/templates/lxc-kali.in b/templates/lxc-kali.in new file mode 100644 index 0000000..8656438 --- /dev/null +++ b/templates/lxc-kali.in @@ -0,0 +1,765 @@ +#!/bin/bash + +# +# lxc: linux Container library + +# Authors: +# Daniel Lezcano +# Re4son + +# This library is free software; you can redistribute it and/or +# modify it under the terms of the GNU Lesser General Public +# License as published by the Free Software Foundation; either +# version 2.1 of the License, or (at your option) any later version. + +# This library is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +# Lesser General Public License for more details. + +# You should have received a copy of the GNU Lesser General Public +# License along with this library; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +# Detect use under userns (unsupported) +for arg in "$@"; do + [ "$arg" = "--" ] && break + if [ "$arg" = "--mapped-uid" -o "$arg" = "--mapped-gid" ]; then + echo "This template can't be used for unprivileged containers." 1>&2 + echo "You may want to try the \"download\" template instead." 1>&2 + exit 1 + fi +done + +# Make sure the usual locations are in PATH +export PATH=$PATH:/usr/sbin:/usr/bin:/sbin:/bin +export GREP_OPTIONS="" + +MIRROR=${MIRROR:-http://http.kali.org/kali} +## SECURITY_MIRROR=${SECURITY_MIRROR:-http://http.kali.org/} +LOCALSTATEDIR="@LOCALSTATEDIR@" +LXC_TEMPLATE_CONFIG="@LXCTEMPLATECONFIG@" +# Allows the lxc-cache directory to be set by environment variable +LXC_CACHE_PATH=${LXC_CACHE_PATH:-"$LOCALSTATEDIR/cache/lxc"} + +find_interpreter() +{ + given_interpreter=$(basename "$1") + + if [ ! -d /proc/sys/fs/binfmt_misc/ ] ; then + return 1 + fi + for file in /proc/sys/fs/binfmt_misc/* ; do + if [ "$file" = "/proc/sys/fs/binfmt_misc/register" -o \ + "$file" = "/proc/sys/fs/binfmt_misc/status" ] ; then + continue + fi + interpreter_path=$(sed -n "/^interpreter/s/interpreter \([^[:space:]]*\)/\1/p" "$file") + interpreter=$(basename "$interpreter_path") + + if [ "$given_interpreter" = "$interpreter" ] ; then + echo "$interpreter_path" + return 0 + fi + done + return 1 +} + +configure_kali() +{ + rootfs=$1 + hostname=$2 + num_tty=$3 + + # squeeze only has /dev/tty and /dev/tty0 by default, + # therefore creating missing device nodes for tty1-4. + for tty in $(seq 1 "$num_tty"); do + if [ ! -e "$rootfs/dev/tty$tty" ]; then + mknod "$rootfs/dev/tty$tty" c 4 "$tty" + fi + done + + # configure the inittab + cat < $rootfs/etc/inittab +id:3:initdefault: +si::sysinit:/etc/init.d/rcS +l0:0:wait:/etc/init.d/rc 0 +l1:1:wait:/etc/init.d/rc 1 +l2:2:wait:/etc/init.d/rc 2 +l3:3:wait:/etc/init.d/rc 3 +l4:4:wait:/etc/init.d/rc 4 +l5:5:wait:/etc/init.d/rc 5 +l6:6:wait:/etc/init.d/rc 6 +# Normally not reached, but fallthrough in case of emergency. +z6:6:respawn:/sbin/sulogin +1:2345:respawn:/sbin/getty 38400 console +$(for tty in $(seq 1 "$num_tty"); do echo "c${tty}:12345:respawn:/sbin/getty 38400 tty${tty} linux" ; done;) +p6::ctrlaltdel:/sbin/init 6 +p0::powerfail:/sbin/init 0 +EOF + + # symlink mtab + [ -e "$rootfs/etc/mtab" ] && rm "$rootfs/etc/mtab" + ln -s /proc/self/mounts "$rootfs/etc/mtab" + + # disable selinux in kali + mkdir -p "$rootfs/selinux" + echo 0 > "$rootfs/selinux/enforce" + + # configure the network using the dhcp + cat < $rootfs/etc/network/interfaces +auto lo +iface lo inet loopback + +auto eth0 +iface eth0 inet dhcp +EOF + + # set the hostname + cat < $rootfs/etc/hostname +$hostname +EOF + + # reconfigure some services + + # but first reconfigure locales - so we get no noisy perl-warnings + if [ -z "$LANG" ] || echo $LANG | grep -E -q "^C(\..+)*$"; then + cat >> "$rootfs/etc/locale.gen" << EOF +en_US.UTF-8 UTF-8 +EOF + chroot "$rootfs" locale-gen en_US.UTF-8 UTF-8 + chroot "$rootfs" update-locale LANG=en_US.UTF-8 + else + encoding=$(echo "$LANG" | cut -d. -f2) + chroot "$rootfs" sed -e "s/^# \(${LANG} ${encoding}\)/\1/" \ + -i /etc/locale.gen 2> /dev/null + cat >> "$rootfs/etc/locale.gen" << EOF +$LANG $encoding +EOF + chroot "$rootfs" locale-gen "$LANG" "$encoding" + chroot "$rootfs" update-locale LANG="$LANG" + fi + + # generate new SSH keys + if [ -x "$rootfs/var/lib/dpkg/info/openssh-server.postinst" ]; then + cat > "$rootfs/usr/sbin/policy-rc.d" << EOF +#!/bin/sh +exit 101 +EOF + chmod +x "$rootfs/usr/sbin/policy-rc.d" + + if [ -f "$rootfs/etc/init/ssh.conf" ]; then + mv "$rootfs/etc/init/ssh.conf" "$rootfs/etc/init/ssh.conf.disabled" + fi + + rm -f "$rootfs/etc/ssh/"ssh_host_*key* + + DPKG_MAINTSCRIPT_PACKAGE=openssh DPKG_MAINTSCRIPT_NAME=postinst chroot "$rootfs" /var/lib/dpkg/info/openssh-server.postinst configure + sed -i "s/root@$(hostname)/root@$hostname/g" "$rootfs/etc/ssh/"ssh_host_*.pub + + if [ -f "$rootfs/etc/init/ssh.conf.disabled" ]; then + mv "$rootfs/etc/init/ssh.conf.disabled" "$rootfs/etc/init/ssh.conf" + fi + + rm -f "$rootfs/usr/sbin/policy-rc.d" + fi + + # set initial timezone as on host + if [ -f /etc/timezone ]; then + cat /etc/timezone > "$rootfs/etc/timezone" + chroot "$rootfs" dpkg-reconfigure -f noninteractive tzdata + elif [ -f /etc/sysconfig/clock ]; then + . /etc/sysconfig/clock + echo "$ZONE" > "$rootfs/etc/timezone" + chroot "$rootfs" dpkg-reconfigure -f noninteractive tzdata + else + echo "Timezone in container is not configured. Adjust it manually." + fi + + if [ -n "$authkey" ]; then + local ssh_dir_path="${rootfs}/root/.ssh" + mkdir -p "$ssh_dir_path" + cp "$authkey" "${ssh_dir_path}/authorized_keys" + chmod 700 "$ssh_dir_path" + echo "Inserted SSH public key from '$authkey' into /root/.ssh/authorized_keys" + fi + + return 0 +} + +write_sourceslist() +{ + local rootfs="$1"; shift + local release="$1"; shift + local arch="$1"; shift + + local prefix="deb" + if [ -n "${arch}" ]; then + prefix="deb [arch=${arch}]" + fi + + if [ "$mainonly" = 1 ]; then + non_main='' + else + non_main=' contrib non-free' + fi + + cat >> "${rootfs}/etc/apt/sources.list" << EOF +${prefix} $MIRROR ${release} main${non_main} +EOF + +} + +install_packages() +{ + local rootfs="$1"; shift + local packages="$*" + + chroot "${rootfs}" apt-get update + if [ -n "${packages}" ]; then + chroot "${rootfs}" apt-get install --force-yes -y --no-install-recommends ${packages} + fi +} + +configure_kali_systemd() +{ + path=$1 + rootfs=$2 + config=$3 + num_tty=$4 + + # just in case systemd is not installed + mkdir -p "${rootfs}/lib/systemd/system" + mkdir -p "${rootfs}/etc/systemd/system/getty.target.wants" + + # Fix getty-static-service as debootstrap does not install dbus + if [ -e "$rootfs//lib/systemd/system/getty-static.service" ] ; then + local tty_services + tty_services=$(for i in $(seq 2 "$num_tty"); do echo -n "getty at tty${i}.service "; done; ) + sed 's/ getty at tty.*/'" $tty_services "'/g' \ + "$rootfs/lib/systemd/system/getty-static.service" | \ + sed 's/\(tty2-tty\)[5-9]/\1'"${num_tty}"'/g' > "$rootfs/etc/systemd/system/getty-static.service" + fi + + # This function has been copied and adapted from lxc-fedora + rm -f "${rootfs}/etc/systemd/system/default.target" + chroot "${rootfs}" ln -s /dev/null /etc/systemd/system/udev.service + chroot "${rootfs}" ln -s /dev/null /etc/systemd/system/systemd-udevd.service + chroot "${rootfs}" ln -s /lib/systemd/system/multi-user.target /etc/systemd/system/default.target + # Setup getty service on the ttys we are going to allow in the + # default config. Number should match lxc.tty.max + ( cd "${rootfs}/etc/systemd/system/getty.target.wants" + for i in $(seq 1 "$num_tty") ; do ln -sf ../getty\@.service getty at tty"${i}".service; done ) + + # Since we use static-getty.target; we need to mask container-getty at .service generated by + # container-getty-generator, so we don't get multiple instances of agetty running. + # See https://github.com/lxc/lxc/issues/520 and https://github.com/lxc/lxc/issues/484 + ( cd "${rootfs}/etc/systemd/system/getty.target.wants" + for i in $(seq 0 "$num_tty"); do ln -sf /dev/null container-getty\@"${i}".service; done ) + + return 0 +} + +# Check if given path is in a btrfs partition +is_btrfs() +{ + [ -e "$1" -a "$(stat -f -c '%T' "$1")" = "btrfs" ] +} + +# Check if given path is the root of a btrfs subvolume +is_btrfs_subvolume() +{ + [ -d "$1" -a "$(stat -f -c '%T' "$1")" = "btrfs" -a "$(stat -c '%i' "$1")" -eq 256 ] +} + +try_mksubvolume() +{ + path=$1 + [ -d "$path" ] && return 0 + mkdir -p "$(dirname "$path")" + if which btrfs >/dev/null 2>&1 && is_btrfs "$(dirname "$path")"; then + btrfs subvolume create "$path" + else + mkdir -p "$path" + fi +} + +try_rmsubvolume() +{ + path=$1 + [ -d "$path" ] || return 0 + if which btrfs >/dev/null 2>&1 && is_btrfs_subvolume "$path"; then + btrfs subvolume delete "$path" + else + rm -rf "$path" + fi +} + +cleanup() +{ + try_rmsubvolume "$cache/partial-$release-$arch" + try_rmsubvolume "$cache/rootfs-$release-$arch" +} + +download_kali() +{ + init=init + iproute=iproute2 + packages=\ +$init,\ +ifupdown,\ +locales,\ +dialog,\ +isc-dhcp-client,\ +netbase,\ +net-tools,\ +$iproute,\ +openssh-server,\ +kali-archive-keyring + + cache=$1 + arch=$2 + release=$3 + interpreter="$4" + interpreter_path="$5" + + trap cleanup EXIT SIGHUP SIGINT SIGTERM + + # Create the cache + mkdir -p "$cache" + + # If kali-archive-keyring isn't installed, fetch GPG keys directly + releasekeyring=/usr/share/keyrings/kali-archive-keyring.gpg + if [ ! -f $releasekeyring ]; then + releasekeyring="$cache/archive-key.gpg" + gpgkeyname="archive-key" + wget https://archive.kali.org/${gpgkeyname}.asc -O - --quiet \ + | gpg --import --no-default-keyring --keyring="${releasekeyring}" + fi + # check the mini kali was not already downloaded + try_mksubvolume "$cache/partial-$release-$arch" + if [ $? -ne 0 ]; then + echo "Failed to create '$cache/partial-$release-$arch' directory" + return 1 + fi + + # download a mini kali into a cache + echo "Downloading kali minimal ..." + if [ "$interpreter" = "" ] ; then + debootstrap --verbose --variant=minbase --arch="$arch" \ + --include=$packages --keyring="${releasekeyring}" \ + "$release" "$cache/partial-$release-$arch" "$MIRROR" + if [ $? -ne 0 ]; then + echo "Failed to download the rootfs, aborting." + return 1 + fi + else + debootstrap --foreign --verbose --variant=minbase --arch="$arch" \ + --include=$packages --keyring="${releasekeyring}" \ + "$release" "$cache/partial-$release-$arch" "$MIRROR" + if [ $? -ne 0 ]; then + echo "Failed to download the rootfs, aborting." + return 1 + fi + mkdir -p "$(basename "$cache/partial-$release-$arch/$interpreter_path")" + cp "$interpreter" "$cache/partial-$release-$arch/$interpreter_path" + if [ $? -ne 0 ]; then + echo "failed to copy $interpreter to $cache/partial-$release-$arch/$interpreter_path" + return 1 + fi + chroot "$cache/partial-$release-$arch" debootstrap/debootstrap --second-stage + if [ $? -ne 0 ]; then + echo "failed to update the rootfs, aborting" + return 1 + fi + fi + + mv "$1/partial-$release-$arch" "$1/rootfs-$release-$arch" + echo "Download complete." + trap EXIT + trap SIGINT + trap SIGTERM + trap SIGHUP + + return 0 +} + +copy_kali() +{ + cache=$1 + arch=$2 + rootfs=$3 + release=$4 + + # make a local copy of the minikali + echo -n "Copying rootfs to $rootfs..." + try_mksubvolume "$rootfs" + if which btrfs >/dev/null 2>&1 && \ + is_btrfs_subvolume "$cache/rootfs-$release-$arch" && \ + is_btrfs_subvolume "$rootfs"; then + realrootfs="$(dirname "$config")"/rootfs + [ "$rootfs" = "$realrootfs" ] || umount "$rootfs" || return 1 + btrfs subvolume delete "$realrootfs" || return 1 + btrfs subvolume snapshot "$cache/rootfs-$release-$arch" "$realrootfs" || return 1 + [ "$rootfs" = "$realrootfs" ] || mount --bind "$realrootfs" "$rootfs" || return 1 + else + rsync -SHaAX "$cache/rootfs-$release-$arch"/ $rootfs/ || return 1 + fi + return 0 +} + +install_kali() +{ + rootfs=$1 + release=$2 + arch=$3 + cache="$4/kali" + interpreter="$5" + interpreter_path="$6" + flushcache=$7 + mkdir -p $LOCALSTATEDIR/lock/subsys/ + ( + flock -x 9 + if [ $? -ne 0 ]; then + echo "Cache repository is busy." + return 1 + fi + + if [ "$flushcache" -eq 1 ]; then + echo "Flushing cache..." + cleanup + fi + + echo "Checking cache download in $cache/rootfs-$release-$arch ... " + if [ ! -e "$cache/rootfs-$release-$arch" ]; then + download_kali "$cache" "$arch" "$release" "$interpreter" "$interpreter_path" + if [ $? -ne 0 ]; then + echo "Failed to download 'kali base'" + return 1 + fi + fi + + copy_kali "$cache" "$arch" "$rootfs" "$release" + if [ $? -ne 0 ]; then + echo "Failed to copy rootfs" + return 1 + fi + + return 0 + + ) 9>$LOCALSTATEDIR/lock/subsys/lxc-kali + + return $? +} + +copy_configuration() +{ + path=$1 + rootfs=$2 + hostname=$3 + arch=$4 + num_tty=$5 + + # Generate the configuration file + # if there is exactly one veth network entry, make sure it has an + # associated hwaddr. + nics=$(grep -ce '^lxc\.net\.0\.type[ \t]*=[ \t]*veth' "$path/config") + if [ "$nics" -eq 1 ]; then + grep -q "^lxc.net.0.hwaddr" "$path/config" || sed -i -e "/^lxc\.net\.0\.type[ \t]*=[ \t]*veth/a lxc.net.0.hwaddr = 00:16:3e:$(openssl rand -hex 3| sed 's/\(..\)/\1:/g; s/.$//')" "$path/config" + fi + + ## Add all the includes + echo "" >> "$path/config" + echo "# Common configuration" >> "$path/config" + if [ -e "${LXC_TEMPLATE_CONFIG}/kali.common.conf" ]; then + echo "lxc.include = ${LXC_TEMPLATE_CONFIG}/kali.common.conf" >> "$path/config" + fi + if [ -e "${LXC_TEMPLATE_CONFIG}/kali.${release}.conf" ]; then + echo "lxc.include = ${LXC_TEMPLATE_CONFIG}/kali.${release}.conf" >> "$path/config" + fi + + ## Add the container-specific config + echo "" >> "$path/config" + echo "# Container specific configuration" >> "$path/config" + grep -q "^lxc.rootfs.path" "$path/config" 2> /dev/null || echo "lxc.rootfs.path = $rootfs" >> "$path/config" + + cat <> $path/config +lxc.tty.max = $num_tty +lxc.uts.name = $hostname +lxc.arch = $arch +lxc.pty.max = 1024 +EOF + + if [ $? -ne 0 ]; then + echo "Failed to add configuration" + return 1 + fi + + return 0 +} + +post_process() +{ + local rootfs="$1"; shift + local release="$1"; shift + local arch="$1"; shift + local hostarch="$1"; shift + local interpreter="$1"; shift + local packages="$*" + + # Disable service startup + cat > "${rootfs}/usr/sbin/policy-rc.d" << EOF +#!/bin/sh +exit 101 +EOF + chmod +x "${rootfs}/usr/sbin/policy-rc.d" + + # If the container isn't running a native architecture, setup multiarch + if [ "$interpreter" = "" -a "${arch}" != "${hostarch}" ]; then + # Test if dpkg supports multiarch + if ! chroot "$rootfs" dpkg --print-foreign-architectures 2>&1; then + chroot "$rootfs" dpkg --add-architecture "${hostarch}" + fi + fi + + # Write a new sources.list containing both native and multiarch entries + > "${rootfs}/etc/apt/sources.list" + if [ "$interpreter" != "" -a "${arch}" = "${hostarch}" ]; then + write_sourceslist "${rootfs}" "${release}" "${arch}" + else + write_sourceslist "${rootfs}" "${release}" + fi + + # Install Packages in container + if [ -n "${packages}" ]; then + local pack_list + pack_list="${packages//,/ }" + echo "Installing packages: ${pack_list}" + install_packages "${rootfs}" "${pack_list}" + fi + + # Re-enable service startup + rm "${rootfs}/usr/sbin/policy-rc.d" + + # end +} + +clean() +{ + cache=${LXC_CACHE_PATH:-"$LOCALSTATEDIR/cache/lxc/kali"} + + if [ ! -e "$cache" ]; then + exit 0 + fi + + # lock, so we won't purge while someone is creating a repository + ( + flock -x 9 + if [ $? != 0 ]; then + echo "Cache repository is busy." + exit 1 + fi + + echo -n "Purging the download cache..." + rm --preserve-root --one-file-system -rf "$cache" && echo "Done." || exit 1 + exit 0 + + ) 9>$LOCALSTATEDIR/lock/subsys/lxc-kali +} + +usage() +{ + cat < [-c|--clean] [-a|--arch=] [-r|--release=] + [--mirror=] [--security-mirror=] + [--package=] + [-I|--interpreter-path=] + [-F | --flush-cache] [-S|--auth-key=] + +Options : + + -h, --help print this help text + -p, --path=PATH directory where config and rootfs of this VM will be kept + -S, --auth-key=KEYFILE SSH public key to inject into the container as the root user. + -a, --arch=ARCH The container architecture. Can be one of: i686, x86_64, + amd64, armhf, armel. Defaults to host arch. + --mirror=MIRROR Kali mirror to use during installation. Overrides the MIRROR + environment variable (see below). + --security-mirror=SECURITY_MIRROR + Kali mirror to use for security updates. Overrides the + SECURITY_MIRROR environment variable (see below). + --packages=PACKAGE_NAME1,PACKAGE_NAME2,... + List of additional packages to install. Comma separated, without space. + -c, --clean only clean up the cache and terminate + --enable-non-free include also Kali's contrib and non-free repositories. + -I|--interpreter-path=INTERPRETER-PATH + Path of the binfmt interpreter to copy to the rootfs + -F | --flush-cache Flush the kali release cache + +Environment variables: + + MIRROR The Kali package mirror to use. See also the --mirror switch above. + Defaults to '$MIRROR' +EOF + return 0 +} + +options=$(getopt -o hp:n:a:cI:FS: -l arch:,auth-key:,clean,help,enable-non-free,mirror:,name:,packages:,path:,rootfs:,interpreter-path:,flush-cache -- "$@") +if [ $? -ne 0 ]; then + usage "$(basename "$0")" + exit 1 +fi +eval set -- "$options" + +littleendian=$(lscpu | grep '^Byte Order' | grep -q Little && echo yes) + +arch=$(uname -m) +if [ "$arch" = "i686" ]; then + arch="i386" +elif [ "$arch" = "x86_64" ]; then + arch="amd64" +elif [ "$arch" = "armv7l" ]; then + arch="armhf" +elif [ "$arch" = "aarch64" ]; then + arch="arm64" +fi +hostarch=$arch +mainonly=1 +flushcache=0 + +while true +do + case "$1" in + -h|--help) usage "$0" && exit 1;; + --) shift 1; break ;; + + -a|--arch) arch=$2; shift 2;; + -S|--auth-key) authkey=$2; shift 2;; + -I|--interpreter-path) + interpreter="$2"; shift 2;; + -c|--clean) clean=1; shift 1;; + --enable-non-free) mainonly=0; shift 1;; + --mirror) MIRROR=$2; shift 2;; + -n|--name) name=$2; shift 2;; + --packages) packages=$2; shift 2;; + -p|--path) path=$2; shift 2;; + --rootfs) rootfs=$2; shift 2;; + -F|--flush-cache) flushcache=1; shift 1;; + *) break ;; + esac +done + +if [ ! -z "$clean" -a -z "$path" ]; then + clean || exit 1 + exit 0 +fi + +if [ "$arch" = "i686" ]; then + arch=i386 +fi + +if [ "$arch" = "x86_64" ]; then + arch=amd64 +fi + +if [ "$interpreter" = "" ] ; then + if [ $hostarch = "i386" -a $arch = "amd64" ]; then + echo "can't create $arch container on $hostarch" + exit 1 + fi + + if [ $hostarch = "armhf" -o $hostarch = "armel" ] && \ + [ $arch != "armhf" -a $arch != "armel" ]; then + echo "can't create $arch container on $hostarch" + exit 1 + fi +else + if ! file -b "${interpreter}" |grep -q "statically linked" ; then + echo "'${interpreter}' must be statically linked" 1>&2 + exit 1 + fi + interpreter_path=$(find_interpreter "$interpreter") + if [ $? -ne 0 ] ; then + echo "no binfmt interpreter using $(basename "$interpreter")" 1>&2 + exit 1 + fi +fi + +type debootstrap +if [ $? -ne 0 ]; then + echo "'debootstrap' command is missing" + exit 1 +fi + +if [ -z "$path" ]; then + echo "'path' parameter is required" + exit 1 +fi + +if [ "$(id -u)" != "0" ]; then + echo "This script should be run as 'root'" + exit 1 +fi + +if [ -n "$authkey" ]; then + if [ ! -f "$authkey" ]; then + echo "SSH keyfile '$authkey' not found" + exit 1 + fi + # This is mostly to prevent accidental uage of the private key instead + # of the public key. + if [ "${authkey: -4}" != ".pub" ]; then + echo "SSH keyfile '$authkey' does not end with '.pub'" + exit 1 + fi +fi + +release=kali-rolling + +# detect rootfs +config="$path/config" +if [ -z "$rootfs" ]; then + if grep -q '^lxc.rootfs.path' "$config" 2> /dev/null ; then + rootfs=$(awk -F= '/^lxc.rootfs.path[ \t]+=/{ print $2 }' "$config") + else + rootfs=$path/rootfs + fi +fi + +# determine the number of ttys - default is 4 +if grep -q '^lxc.tty.max' "$config" 2> /dev/null ; then + num_tty=$(awk -F= '/^lxc.tty.max[ \t]+=/{ print $2 }' "$config") +else + num_tty=4 +fi + +install_kali "$rootfs" "$release" "$arch" "$LXC_CACHE_PATH" "$interpreter" "$interpreter_path" "$flushcache" +if [ $? -ne 0 ]; then + echo "failed to install kali" + exit 1 +fi + +configure_kali "$rootfs" "$name" $num_tty +if [ $? -ne 0 ]; then + echo "failed to configure kali for a container" + exit 1 +fi + +copy_configuration "$path" "$rootfs" "$name" $arch $num_tty +if [ $? -ne 0 ]; then + echo "failed write configuration file" + exit 1 +fi + +configure_kali_systemd "$path" "$rootfs" "$config" $num_tty + +post_process "${rootfs}" "${release}" ${arch} ${hostarch} "${interpreter}" "${packages}" + +if [ ! -z "$clean" ]; then + clean || exit 1 + exit 0 +fi From lxc-bot at linuxcontainers.org Fri May 24 21:49:26 2019 From: lxc-bot at linuxcontainers.org (Re4son on Github) Date: Fri, 24 May 2019 14:49:26 -0700 (PDT) Subject: [lxc-devel] [distrobuilder/master] Add kali-linux distro Message-ID: <5ce866e6.1c69fb81.d0220.7ca9SMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 349 bytes Desc: not available URL: -------------- next part -------------- From 8dac9aaa0e7368fe8be1e9d2743e345ef619a8dc Mon Sep 17 00:00:00 2001 From: Re4son Date: Thu, 23 May 2019 15:34:06 +1000 Subject: [PATCH] Add kali-linux distro Signed-off-by: Re4son --- doc/examples/kali | 124 ++++++++++++++++++++++ doc/examples/kali-with-core-packages | 151 +++++++++++++++++++++++++++ shared/definition.go | 1 + shared/definition_test.go | 3 + shared/osarch.go | 8 ++ shared/osarch_test.go | 13 +++ 6 files changed, 300 insertions(+) create mode 100644 doc/examples/kali create mode 100644 doc/examples/kali-with-core-packages diff --git a/doc/examples/kali b/doc/examples/kali new file mode 100644 index 0000000..5a9554e --- /dev/null +++ b/doc/examples/kali @@ -0,0 +1,124 @@ +image: + distribution: "kali" + release: kali-rolling + +source: + downloader: debootstrap + url: http://http.kali.org/kali + keyserver: keys.gnupg.net + keys: + - 44C6513A8E4FB3D30875F758ED444FF07D8D0BF6 + variant: minbase + apt_sources: |- + deb http://http.kali.org/kali {{ image.release }} main non-free contrib + +targets: + lxc: + create-message: |- + You just created a {{ image.description }} container. + + To enable SSH, run: apt install openssh-server + No default root or user password are set by LXC. + + config: + - type: all + before: 5 + content: |- + lxc.include = LXC_TEMPLATE_CONFIG/kali.common.conf + + - type: user + before: 5 + content: |- + lxc.include = LXC_TEMPLATE_CONFIG/kali.userns.conf + + - type: all + after: 4 + content: |- + lxc.include = LXC_TEMPLATE_CONFIG/common.conf + + - type: user + after: 4 + content: |- + lxc.include = LXC_TEMPLATE_CONFIG/userns.conf + + - type: all + content: |- + lxc.arch = {{ image.architecture_personality }} + +files: + - path: /etc/hostname + generator: hostname + + - path: /etc/hosts + generator: hosts + + - path: /etc/resolvconf/resolv.conf.d/original + generator: remove + + - path: /etc/resolvconf/resolv.conf.d/tail + generator: remove + + - path: /etc/machine-id + generator: remove + + - path: /etc/network/interfaces + generator: dump + content: |- + # This file describes the network interfaces available on your system + # and how to activate them. For more information, see interfaces(5). + + # The loopback network interface + auto lo + iface lo inet loopback + + auto eth0 + iface eth0 inet dhcp + +packages: + manager: apt + update: true + cleanup: true + + sets: + - packages: + - dialog + - ifupdown + - isc-dhcp-client + - locales + - netbase + - net-tools + - openssh-client + - vim + - systemd + - kali-archive-keyring + action: install + +actions: + - trigger: post-packages + action: |- + #!/bin/sh + set -eux + + # Make sure the locale is built and functional + echo en_US.UTF-8 UTF-8 >> /etc/locale.gen + locale-gen en_US.UTF-8 UTF-8 + update-locale LANG=en_US.UTF-8 + + # Cleanup underlying /run + mount -o bind / /mnt + rm -rf /mnt/run/* + umount /mnt + + # Cleanup temporary shadow paths + rm /etc/*- + + - trigger: post-packages + action: |- + #!/bin/sh + set -eux + apt-get install iproute2 init -y + releases: + - kali-rolling + +mappings: + architecture_map: kali diff --git a/doc/examples/kali-with-core-packages b/doc/examples/kali-with-core-packages new file mode 100644 index 0000000..cfab995 --- /dev/null +++ b/doc/examples/kali-with-core-packages @@ -0,0 +1,151 @@ +image: + distribution: "kali" + release: kali-rolling + +source: + downloader: debootstrap + url: http://http.kali.org/kali + keyserver: keys.gnupg.net + keys: + - 44C6513A8E4FB3D30875F758ED444FF07D8D0BF6 + variant: minbase + apt_sources: |- + deb http://http.kali.org/kali {{ image.release }} main non-free contrib + +targets: + lxc: + create-message: |- + You just created a {{ image.description }} container. + + To enable SSH, run: apt install openssh-server + No default root or user password are set by LXC. + + config: + - type: all + before: 5 + content: |- + lxc.include = LXC_TEMPLATE_CONFIG/kali.common.conf + + - type: user + before: 5 + content: |- + lxc.include = LXC_TEMPLATE_CONFIG/kali.userns.conf + + - type: all + after: 4 + content: |- + lxc.include = LXC_TEMPLATE_CONFIG/common.conf + + - type: user + after: 4 + content: |- + lxc.include = LXC_TEMPLATE_CONFIG/userns.conf + + - type: all + content: |- + lxc.arch = {{ image.architecture_personality }} + +files: + - path: /etc/hostname + generator: hostname + + - path: /etc/hosts + generator: hosts + + - path: /etc/resolvconf/resolv.conf.d/original + generator: remove + + - path: /etc/resolvconf/resolv.conf.d/tail + generator: remove + + - path: /etc/machine-id + generator: remove + + - path: /etc/network/interfaces + generator: dump + content: |- + # This file describes the network interfaces available on your system + # and how to activate them. For more information, see interfaces(5). + + # The loopback network interface + auto lo + iface lo inet loopback + + auto eth0 + iface eth0 inet dhcp + +packages: + manager: apt + update: true + cleanup: true + + sets: + - packages: + - dialog + - ifupdown + - isc-dhcp-client + - locales + - netbase + - net-tools + - openssh-client + - vim + - systemd + - iw + - kali-defaults + - mlocate + - netcat-traditional + - net-tools + - psmisc + - screen + - tmux + - wget + - zerofree + - exploitdb + - hydra + - john + - medusa + - metasploit-framework + - mfoc + - ncrack + - nmap + - passing-the-hash + - proxychains + - recon-ng + - sqlmap + - tcpdump + - theharvester + - tor + - tshark + - whois + - kali-archive-keyring + action: install + +actions: + - trigger: post-packages + action: |- + #!/bin/sh + set -eux + + # Make sure the locale is built and functional + echo en_US.UTF-8 UTF-8 >> /etc/locale.gen + locale-gen en_US.UTF-8 UTF-8 + update-locale LANG=en_US.UTF-8 + + # Cleanup underlying /run + mount -o bind / /mnt + rm -rf /mnt/run/* + umount /mnt + + # Cleanup temporary shadow paths + rm /etc/*- + + - trigger: post-packages + action: |- + #!/bin/sh + set -eux + apt-get install iproute2 init -y + releases: + - kali-rolling + +mappings: + architecture_map: kali diff --git a/shared/definition.go b/shared/definition.go index 058a32b..3f8bd16 100644 --- a/shared/definition.go +++ b/shared/definition.go @@ -333,6 +333,7 @@ func (d *Definition) Validate() error { "centos", "debian", "gentoo", + "kali", "plamolinux", } diff --git a/shared/definition_test.go b/shared/definition_test.go index a4099d9..2db51e9 100644 --- a/shared/definition_test.go +++ b/shared/definition_test.go @@ -49,6 +49,9 @@ func TestValidateDefinition(t *testing.T) { Mappings: DefinitionMappings{ ArchitectureMap: "debian", }, + Mappings: DefinitionMappings{ + ArchitectureMap: "kali", + }, }, "", false, diff --git a/shared/osarch.go b/shared/osarch.go index f0f67ef..5124075 100644 --- a/shared/osarch.go +++ b/shared/osarch.go @@ -41,6 +41,13 @@ var gentooArchitectureNames = map[int]string{ osarch.ARCH_64BIT_S390_BIG_ENDIAN: "s390x", } +var kaliArchitectureNames = map[int]string{ + osarch.ARCH_32BIT_INTEL_X86: "i386", + osarch.ARCH_64BIT_INTEL_X86: "amd64", + osarch.ARCH_32BIT_ARMV7_LITTLE_ENDIAN: "armhf", + osarch.ARCH_64BIT_ARMV8_LITTLE_ENDIAN: "arm64", +} + var plamoLinuxArchitectureNames = map[int]string{ osarch.ARCH_32BIT_INTEL_X86: "x86", } @@ -58,6 +65,7 @@ var distroArchitecture = map[string]map[int]string{ "centos": centosArchitectureNames, "debian": debianArchitectureNames, "gentoo": gentooArchitectureNames, + "kali": kaliArchitectureNames, "plamolinux": plamoLinuxArchitectureNames, } diff --git a/shared/osarch_test.go b/shared/osarch_test.go index 4068033..25c777f 100644 --- a/shared/osarch_test.go +++ b/shared/osarch_test.go @@ -38,6 +38,16 @@ func TestGetArch(t *testing.T) { "s390x", "s390x", }, + { + "kali", + "amd64", + "amd64", + }, + { + "kali", + "x86_64", + "amd64", + }, } for i, tt := range tests { @@ -52,4 +62,7 @@ func TestGetArch(t *testing.T) { _, err = GetArch("debian", "arch") require.EqualError(t, err, "Architecture isn't supported: arch") + + _, err = GetArch("kali", "arch") + require.EqualError(t, err, "Architecture isn't supported: arch") } From noreply at github.com Tue May 28 14:45:39 2019 From: noreply at github.com (Christian Brauner) Date: Tue, 28 May 2019 14:45:39 +0000 (UTC) Subject: [lxc-devel] [lxc/lxc] d871a9: fix issue 2765 Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: d871a9f1e562ff0ff8c0f8b4124246a8521cabca https://github.com/lxc/lxc/commit/d871a9f1e562ff0ff8c0f8b4124246a8521cabca Author: Alexander Kriventsov Date: 2019-05-28 (Tue, 28 May 2019) Changed paths: M src/lxc/cmd/lxc_user_nic.c Log Message: ----------- fix issue 2765 Signed-off-by: Alexander Kriventsov Commit: 0cfec4f757b526a4f40167034a0b76c9cb809808 https://github.com/lxc/lxc/commit/0cfec4f757b526a4f40167034a0b76c9cb809808 Author: Christian Brauner Date: 2019-05-28 (Tue, 28 May 2019) Changed paths: M src/lxc/cmd/lxc_user_nic.c Log Message: ----------- Merge pull request #3015 from avkvl/issue-2765 fix issue 2765 Compare: https://github.com/lxc/lxc/compare/c54cf53fadf4...0cfec4f757b5 From noreply at github.com Wed May 29 15:14:02 2019 From: noreply at github.com (Christian Brauner) Date: Wed, 29 May 2019 08:14:02 -0700 Subject: [lxc-devel] [lxc/lxc] c74e92: lxc_clone: pass non-stack allocated stack to clone Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: c74e9217448743e9cdfc068568bf3d7c720ca21b https://github.com/lxc/lxc/commit/c74e9217448743e9cdfc068568bf3d7c720ca21b Author: Tycho Andersen Date: 2019-05-15 (Wed, 15 May 2019) Changed paths: M src/lxc/namespace.c Log Message: ----------- lxc_clone: pass non-stack allocated stack to clone There are two problems with this code: 1. The math is wrong. We allocate a char *foo[__LXC_STACK_SIZE]; which means it's really sizeof(char *) * __LXC_STACK_SIZE, instead of just __LXC_STACK SIZE. 2. We can't actually allocate it on our stack. When we use CLONE_VM (which we do in the shared ns case) that means that the new thread is just running one page lower on the stack, but anything that allocates a page on the stack may clobber data. This is a pretty short race window since we just do the shared ns stuff and then do a clone without CLONE_VM. However, it does point out an interesting possible privilege escalation if things aren't configured correctly: do_share_ns() sets up namespaces while it shares the address space of the task that spawned it; once it enters the pid ns of the thing it's sharing with, the thing it's sharing with can ptrace it and write stuff into the host's address space. Since the function that does the clone() is lxc_spawn(), it has a struct cgroup_ops* on the stack, which itself has function pointers called later in the function, so it's possible to allocate shellcode in the address space of the host and run it fairly easily. ASLR doesn't mitigate this since we know exactly the stack offsets; however this patch has the kernel allocate a new stack, which will help. Of course, the attacker could just check /proc/pid/maps to find the location of the stack, but they'd still have to guess where to write stuff in. The thing that does prevent this is the default configuration of apparmor. Since the apparmor profile is set in the second clone, and apparmor prevents ptracing things under a different profile, attackers confined by apparmor can't do this. However, if users are using a custom configuration with shared namespaces, care must be taken to avoid this race. Shared namespaces aren't widely used now, so perhaps this isn't a problem, but with the advent of crio-lxc for k8s, this functionality will be used more. Signed-off-by: Tycho Andersen Commit: 8de90384363fe01f5258d36724dd3eae55918b5b https://github.com/lxc/lxc/commit/8de90384363fe01f5258d36724dd3eae55918b5b Author: Tycho Andersen Date: 2019-05-15 (Wed, 15 May 2019) Changed paths: M doc/lxc.container.conf.sgml.in Log Message: ----------- doc: add a little note about shared ns + LSMs We should add a little not about the race in the previous patch. Signed-off-by: Tycho Andersen Commit: 5e7b4b3c166873030d51dc725907351f19d7e0fd https://github.com/lxc/lxc/commit/5e7b4b3c166873030d51dc725907351f19d7e0fd Author: Tycho Andersen Date: 2019-05-15 (Wed, 15 May 2019) Changed paths: M src/lxc/namespace.c Log Message: ----------- lxc_clone: get rid of some indirection We have a do_clone(), which just calls a void f(void *) that it gets passed. We build up a struct consisting of two args that are just the actual arg and actual function. Let's just have the syscall do this for us. Signed-off-by: Tycho Andersen Commit: 3df90604ec03a67791b26c94aa5592d127cb0914 https://github.com/lxc/lxc/commit/3df90604ec03a67791b26c94aa5592d127cb0914 Author: Tycho Andersen Date: 2019-05-29 (Wed, 29 May 2019) Changed paths: M src/lxc/namespace.c Log Message: ----------- lxc_clone: bump stack size to 8MB This is the default thread size for glibc, so it is reasonable to match that when we clone(). Mostly this is a science experiment suggested by brauner, and who doesn't love science? Signed-off-by: Tycho Andersen Commit: 18a405ee88419e0799cf8849f1ad468c859615ba https://github.com/lxc/lxc/commit/18a405ee88419e0799cf8849f1ad468c859615ba Author: Christian Brauner Date: 2019-05-29 (Wed, 29 May 2019) Changed paths: M doc/lxc.container.conf.sgml.in M src/lxc/namespace.c Log Message: ----------- Merge pull request #2987 from tych0/pass-zero-to-clone Pass zero to clone Compare: https://github.com/lxc/lxc/compare/0cfec4f757b5...18a405ee8841 From lxc-bot at linuxcontainers.org Wed May 29 15:37:22 2019 From: lxc-bot at linuxcontainers.org (tych0 on Github) Date: Wed, 29 May 2019 08:37:22 -0700 (PDT) Subject: [lxc-devel] [lxc/master] lxc_clone: add a comment about stack size Message-ID: <5ceea732.1c69fb81.ea45c.0a9eSMTPIN_ADDED_MISSING@mx.google.com> A non-text attachment was scrubbed... Name: not available Type: text/x-mailbox Size: 347 bytes Desc: not available URL: -------------- next part -------------- From edb808d1301c81d6b0a2747dffa6a7019ff20de8 Mon Sep 17 00:00:00 2001 From: Tycho Andersen Date: Wed, 29 May 2019 09:36:51 -0600 Subject: [PATCH] lxc_clone: add a comment about stack size Signed-off-by: Tycho Andersen --- src/lxc/namespace.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/lxc/namespace.c b/src/lxc/namespace.c index 4ede96f2fa..be47b229ec 100644 --- a/src/lxc/namespace.c +++ b/src/lxc/namespace.c @@ -42,6 +42,10 @@ lxc_log_define(namespace, lxc); +/* + * Let's use the "standard stack limit" (i.e. glibc thread size default) for + * stack sizes: 8MB. + */ #define __LXC_STACK_SIZE (8 * 1024 * 1024) pid_t lxc_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) { From noreply at github.com Wed May 29 15:38:25 2019 From: noreply at github.com (Christian Brauner) Date: Wed, 29 May 2019 08:38:25 -0700 Subject: [lxc-devel] [lxc/lxc] edb808: lxc_clone: add a comment about stack size Message-ID: Branch: refs/heads/master Home: https://github.com/lxc/lxc Commit: edb808d1301c81d6b0a2747dffa6a7019ff20de8 https://github.com/lxc/lxc/commit/edb808d1301c81d6b0a2747dffa6a7019ff20de8 Author: Tycho Andersen Date: 2019-05-29 (Wed, 29 May 2019) Changed paths: M src/lxc/namespace.c Log Message: ----------- lxc_clone: add a comment about stack size Signed-off-by: Tycho Andersen Commit: 3e8a11cb1c2fe8c9f44e7ecc6f8f378d1e09fab9 https://github.com/lxc/lxc/commit/3e8a11cb1c2fe8c9f44e7ecc6f8f378d1e09fab9 Author: Christian Brauner Date: 2019-05-29 (Wed, 29 May 2019) Changed paths: M src/lxc/namespace.c Log Message: ----------- Merge pull request #3018 from tych0/comment-stack-size lxc_clone: add a comment about stack size Compare: https://github.com/lxc/lxc/compare/18a405ee8841...3e8a11cb1c2f