systemd-nspawn, again

possible alternative to incus

Disclaimer

Do not use this post as an absolute, fool-proof, totally safe guid. Please do your own investigation on how this works.

One statement in systemd-nspawn documentation is extremely worrying and im not sure if this is the correct way to deal with this:

systemd-nspawn limits access to various kernel interfaces in the container to read-only, such as /sys/, /proc/sys/, or /sys/fs/selinux/. The host’s network interfaces and the system clock may not be changed from within the container. Device nodes may not be created. The host system cannot be rebooted and kernel modules may not be loaded from within the container. This sandbox can easily be circumvented from within the container if user namespaces are not used. This means that untrusted code must always be run in a user namespace, see the discussion of the –private-users= option below.

Introduction

I have been using incus containers for some time on a really shitty HP Microserver Gen10. The one with AMD 2 core cpu.

Yes, the machine sucks.

Now that i mentioned the machine, what happend to HP The Machine? I guess they failed since every one and then some went into monetization project on how to sell security updatesin firmware. F*ck HP.

Anyway, so far it was working well, but i noticed that starting and stopping takes quite some time. Oh what a great opportunity for a time sink investigating improvements!

I have always liked systemd and its integration. Since the time of shell scripts, it most noticably improved system startup times and the fact that a bunch of script could be replaced with a declarative way of service definition. All in all its a great project. One of the components that were delivered are systemd-nspawn. It is a container management similar (and less user friendly) to incus and lxd.

So, lets try it.

Setup

The point of this excercise is to make a wireguard interface in the container with no leaks. Setup will be used hosting microsocks server to be able to circumvent geolocks.

To achieve this, the only internet bound interface must be a VPN interface. Container should also have another one, to access lan. There is a possibility to publish only a port of the container, but i didnt want to have it running this way.

Host

For networking i will be using ipv6 exclusivelly.

Network

On host side, we are going to set up a NetDev using systemd-networkd, but not assign it an ip address. Address will be assigned within the container.

/etc/systemd/network/wg_vpn.netdev

[NetDev]
Name=wg-vpn
Kind=wireguard
Description=WireGuard tunnel for vpn

[WireGuard]
PrivateKey=<my-super-secret-private-key>

[WireGuardPeer]
PublicKey = <public-key>
PresharedKey = <my-super-secret-preshared-key>
Endpoint = vpn.host.example:8861
AllowedIPs = 0.0.0.0/0,::/0
PersistentKeepalive = 15

We also need a second interface, the one used for communication with host. This is going to be a veth pair. With this type of device, you create two interfaces, one used on the host, (i chose to suffix it with “_h”, for host), and another interface that is connected to that one which will be available in the container (i chose “_c” suffix there).

Point to take from this is - veth come in pairs, but are created by only one .netdev file.

/etc/systemd/network/vn_vpn.netdev

[NetDev]
Name=vn_vpn_h
Kind=veth

[Peer]
Name=vn_vpn_c

So, vn_vpn_c interface will be moved in the container and the address will be set in the container. On host, we only set up the address of the “_h” device.

/etc/systemd/network/vn_vpn.network

[Match]
Name=vn_vpn_h

[Network]
Address = fdab::10/64

Since i want to use manual addresses for veth interfaces, i had to use “vn” as a prefix. There is default setup that can be used with nspawn container where it will be done automatically. For this there is a service that assigns (matches) all names starting with “ve-” and assigns /28 subnet to it.

If you want to manually assigned addresses, then use custom interface name that does not start with “ve-”.

Deploying the container

Container can be deployed using debootstrap, dnf, zypper and friends, but i prefer to use same images as Incus, from LXC project.

For list of images, check https://images.linuxcontainers.org/images

To deploy the image locally, there are two versions of the command:

  • A bit older:
machinectl pull-tar https://images.linuxcontainers.org/images/ubuntu/plucky/amd64/cloud/20250714_07:42/rootfs.tar.xz plucky
  • A bit newer:
importctl -mN pull-tar https://images.linuxcontainers.org/images/ubuntu/plucky/amd64/cloud/20250714_07:42/rootfs.tar.xz plucky

Containers, which will be serving this instead of the VMs to save on resources, have their own config option with .nspawn extension. This is documented on systemd freedesktop.org page. This is still on host side, the config tells systemd-nspawn how to start the container.

For this case, we will call the container plucky, because Ubuntu 25.04 is called plucky.

/etc/systemd/nspawn/plucky.nspawn

[Network]
Private=true
Interface=wg-vpn
Interface=vn-vpn-c

Interface directive in .nspawn file says to systemd-nspawn to move the interface into the container. Now this is super important. Interfaces remember where they were created, so wireguard interface, even though it was moved to a container and has no routes set up, knows how to move traffic.

Moving of the interface is the best selling point of this post.

The host side is now set up.

Container side

To start the container, we use another command from systemd palette:

machinectl start plucky

And to get the shell

machinectl shell plucky

Network

Once in the container we need to set up all those interfaces that do not have the address assigned.

/var/lib/machines/plucky/etc/systemd/network/vn_vpn_c.network

[Match]
Name=vn_vpn_c

[Network]
Address = fdab::20/64

/var/lib/machines/plucky/etc/systemd/network/wg_vpn.network

[Match]
Name=wg_vpn

[Network]
Address = <tunnel-address-v4>
Address = <tunnel-address-v6>
DNS = <dns-address-v4>
DNS = <dns-address-v6>
DefaultRouteOnDevice=yes

Autostart

For autostart to work, systemd-nspawn should be waiting for wireguard to be set up before the container starts. For this we need to make a drop-in.

systemctl edit systemd-nspawn@plucky.service

Once editor opens, we need to add the following

[Unit]
Wants=sys-subsystem-net-devices-wg_vpn.device
After=sys-subsystem-net-devices-wg_vpn.device

And finally, this is how auto start is triggered, we just have a systemd service.

systemctl enable systemd-nspawn@plucky

Voila!

Problems

For some reason, after host reboot vpn in the container does not work. Container starts fine, but simply no network except for lan.

Notes:

  • guess: interface is moved before initial wireguard handshake, so udp tunnel is never established
  • lets say we are in roaming and the wifi or mobile tower changes, vpn providers domain suddenly points to another ip and endpoint in vpn config should resolve to new ip, will it work?
  • this only works on ubuntu (out of ubuntu, fedora and opensuse) as a host. fedora has an issue with selinux where images cannot be deployed this way, opensuse doesnt have a working systemd-importd (on leap 16 beta). i did not try debian or some other distro.