Thursday, 05 December 2024
A Small Update
Following swiftly on from my last article, I decided to take the opportunity to extend my framebuffer components to support an interface utilised by the L4Re framework’s Mag component, which is a display multiplexer providing a kind of multiple window environment. I’m not sure if Mag is really supported any more, but it provided the basis of a number of L4Re examples for a while, and I brought it into use for my own demonstrations.
Eventually, having needed to remind myself of some of the details of my own software, I managed to deploy the collection of components required, each with their own specialised task, but most pertinently a SoC-specific SPI driver and a newly extended display-specific framebuffer driver. The framebuffer driver could now be connected directly to Mag in the Lua-based coordination script used by the Ned initialisation program, which starts up programs within L4Re, and Mag could now request a region of memory from the framebuffer driver for further use by other programs.
All of this extra effort merely provided another way of delivering a familiar demonstration, that being the colourful, mesmerising spectrum example once provided as part of the L4Re software distribution. This example also uses the programming interface mentioned above to request a framebuffer from Mag. It then plots its colourful output into this framebuffer.
The result is familiar from earlier articles:
The significant difference, however, is that underneath the application programs, a combination of interchangeable components provides the necessary adaptation to the combination of hardware devices involved. And the framebuffer component can now completely replace the fb-drv component that was also part of the L4Re distribution, thereby eliminating a dependency on a rather cumbersome and presumably obsolete piece of software.
Monday, 02 December 2024
Recent Progress
The last few months have not always been entirely conducive to making significant progress with various projects, particularly my ongoing investigations and experiments with L4Re, but I did manage to reacquaint myself with my previous efforts sufficiently to finally make some headway in November. This article tries to retrieve some of the more significant accomplishments, modest as they might be, to give an impression of how such work is undertaken.
Previously, I had managed to get my software to do somewhat useful things on MIPS-based single-board computer hardware, showing graphical content on a small screen. Various problems had arisen with regard to one revision of a single-board computer for which the screen was originally intended, causing me to shift my focus to more general system functionality within L4Re. With the arrival of the next revision of the board, I leveraged this general functionality, combining it with support for memory cards, to get my minimalist system to operate on the board itself. I rather surprised myself getting this working, it must be said.
Returning to the activity at the start of November, there were still some matters to be resolved. In parallel to my efforts with L4Re, I had been trying to troubleshoot the board’s operation under Linux. Linux is, in general, a topic upon which I do not wish to waste my words. However, with the newer board revision, I had also acquired another, larger, screen and had been investigating its operation, and there were performance-related issues experienced under Linux that needed to be verified under other conditions. This is where a separate software environment can be very useful.
Plugging a Leak
Before turning my attention to the larger screen, I had been running a form of stress test with the smaller screen, updating it intensively while also performing read operations from the memory card. What this demonstrated was that there were no obvious bandwidth issues with regard to data transfers occurring concurrently. Translating this discovery back to Linux remains an ongoing exercise, unfortunately. But another problem arose within my own software environment: after a while, the filesystem server would run out of memory. I felt that this problem now needed to be confronted.
Since I tend to make such problems for myself, I suspected a memory leak in some of my code, despite trying to be methodical in the way that allocated objects are handled. I considered various tools that might localise this particular leak, with AddressSanitizer and LeakSanitizer being potentially useful, merely requiring recompilation and being available for a wide selection of architectures as part of GCC. I also sought to demonstrate the problem in a virtual environment, this simply involving appropriate test programs running under QEMU. Unfortunately, the sanitizer functionality could not be linked into my binaries, at least with the Debian toolchains that I am using.
Eventually, I resolved to use simpler techniques. Wondering if the memory allocator might be fragmenting memory, I introduced a call to malloc_stats, just to get an impression of the state of the heap. After failing to gain much insight into the problem, I rolled up my sleeves and decided to just look through my code for anything I might have done with regard to allocating memory, just to see if I had overlooked anything as I sought to assemble a working system from its numerous pieces.
Sure enough, I had introduced an allocation for “convenience” in one kind of object, making a pool of memory available to that object if no specific pool had been presented to it. The memory pool itself would release its own memory upon disposal, but in focusing on getting everything working, I had neglected to introduce the corresponding top-level disposal operation. With this remedied, my stress test was now able to run seemingly indefinitely.
Separating Displays and Devices
I would return to my generic system support later, but the need to exercise the larger screen led me to consider the way I had previously introduced support for screens and displays. The smaller screen employs SPI as the communications mechanism between the SoC and the display controller, as does the larger screen, and I had implemented support for the smaller screen as a library combining the necessary initialisation and pixel data transfer code with code that would directly access the SPI peripheral using a SoC-specific library.
Clearly, this functionality needed to be separated into two distinct parts: the code retaining the details of initialising and operating the display via its controller, and the code performing the SPI communication for a specific SoC. Not doing this could require us to needlessly build multiple variants of the display driver for different SoCs or platforms, when in principle we should only need one display driver with knowledge of the controller and its peculiarities, this then being combined using interprocess communication with a single, SoC-specific driver for the communications.
A few years ago now, I had in fact implemented a “server” in L4Re to perform short SPI transfers on the Ben NanoNote, this to control the display backlight. It became appropriate to enhance this functionality to allow programs to make longer transfers using data held in shared memory, all of this occurring without those programs having privileged access to the underlying SPI peripheral in the SoC. Alongside the SPI server appropriate for the Ben NanoNote’s SoC, servers would be built for other SoCs, and only the appropriate one would be started on a given hardware device. This would then mediate access to the SPI peripheral, accepting requests from client programs within the established L4Re software architecture.
One important element in the enhanced SPI server functionality is the provision of shared memory that can be used for DMA transfers. Fortunately, this is mostly a matter of using the appropriate settings when requesting memory within L4Re, even though the mechanism has been made somewhat more complicated in recent times. It was also fortunate that I previously needed to consider such matters when implementing memory card support, saving me time in considering them now. The result is that a client program should be able to write into a memory region and the SPI server should be able to send the written data directly to the display controller without any need for additional copying.
Complementing the enhanced SPI servers are framebuffer components that use these servers to configure each kind of display, each providing an interface to their own client programs which, in turn, access the display and provide visual content. The smaller screen uses an ST7789 controller and is therefore supported by one kind of framebuffer component, whereas the larger screen uses an ILI9486 controller and has its own kind of component. In principle, the display controller support could be organised so that common code is reused and that support for additional controllers would only need specialisations to that generic code. Both of these controllers seem to implement the MIPI DBI specifications.
The particular display board housing the larger screen presented some additional difficulties, being very peculiarly designed to present what would seem to be an SPI interface to the hardware interfacing to the board, but where the ILI9486 controller’s parallel interface is apparently used on the board itself, with some shift registers and logic faking the serial interface to the outside world. This complicates the communications, requiring 16-bit values to be sent where 8-bit values would be used in genuine SPI command traffic.
The motivation for this weird design is presumably that of squeezing a little extra performance out of the controller that is only available when transferring pixel data via the parallel interface, especially desired by those making low-cost retrogaming systems with the Raspberry Pi. Various additional tweaks were needed to make the ILI9486 happy, such as an explicit reset pulse, with this being incorporated into my simplistic display component framework. Much more work is required in this area, and I hope to contemplate such matters in the not-too-distant future.
Discoveries and Remedies
Further testing brought some other issues to the fore. With one of the single-board computers, I had been using a microSD card with a capacity of about half a gigabyte, which would make it a traditional SD or SDSC (standard capacity) card, at least according to the broader SD card specifications. With another board, I had been using a card with a sixteen gigabyte capacity or thereabouts, aligning it with the SDHC (high capacity) format.
Starting to exercise my code a bit more on this larger card exposed memory mapping issues when accessing the card as a single region: on the 32-bit MIPS architecture used by the SoC, a pointer simply cannot address this entire region, and thus some pointer arithmetic occurred that had undesirable consequences. Constraining the size of mapped regions seemed like the easiest way of fixing this problem, at least for now.
More sustained testing revealed a couple of concurrency issues. One involved a path of invocation via a method testing for access to filesystem objects where I had overlooked that the method, deliberately omitting usage of a mutex, could be called from another component and thus circumvent the concurrency measures already in place. I may well have refactored components at some point, forgetting about this particular possibility.
Another issue was an oversight in the way an object providing access to file content releases its memory pages for other objects to use before terminating, part of the demand paging framework that has been developed. I had managed to overlook a window between two operations where an object seeking to acquire a page from the terminating object might obtain exclusive access to a page, but upon attempting to notify the terminating object, find that it has since been deallocated. This caused memory access errors.
Strangely, I had previously noticed one side of this potential situation in the terminating object, even writing up some commentary in the code, but I had failed to consider the other side of it lurking between those two operations. Building in the missing support involved getting the terminating object to wait for its counterparts, so that they may notify it about pages they were in the process of removing from its control. Hopefully, this resolves the problem, but perhaps the lesson is that if something anomalous is occurring, exhibiting certain unexpected effects, the cause should not be ignored or assumed to be harmless.
All of this proves to be quite demanding work, having to consider many aspects of a system at a variety of levels and across a breadth of components. Nevertheless, modest progress continues to be made, even if it is entirely on my own initiative. Hopefully, it remains of interest to a few of my readers, too.
Wednesday, 27 November 2024
Creating a kubernetes cluster with kubeadm on Ubuntu 24.04 LTS
(this is a copy of my git repo of this post)
https://github.com/ebal/k8s_cluster/
Kubernetes, also known as k8s, is an open-source system for automating deployment, scaling, and management of containerized applications.
Notice The initial (old) blog post with ubuntu 22.04 is (still) here: blog post
- Prerequisites
- Git Terraform Code for the kubernetes cluster
- Control-Plane Node
- Ports on the control-plane node
- Firewall on the control-plane node
- Hosts file in the control-plane node
- Updating your hosts file
- No Swap on the control-plane node
- Kernel modules on the control-plane node
- NeedRestart on the control-plane node
- temporarily
- permanently
- Installing a Container Runtime on the control-plane node
- Installing kubeadm, kubelet and kubectl on the control-plane node
- Get kubernetes admin configuration images
- Initializing the control-plane node
- Create user access config to the k8s control-plane node
- Verify the control-plane node
- Install an overlay network provider on the control-plane node
- Verify CoreDNS is running on the control-plane node
- Worker Nodes
- Get Token from the control-plane node
- Is the kubernetes cluster running ?
- Kubernetes Dashboard
- Helm
- Install kubernetes dashboard
- Accessing Dashboard via a NodePort
- Patch kubernetes-dashboard
- Edit kubernetes-dashboard Service
- Accessing Kubernetes Dashboard
- Create An Authentication Token (RBAC)
- Creating a Service Account
- Creating a ClusterRoleBinding
- Getting a Bearer Token
- Browsing Kubernetes Dashboard
- Nginx App
- That’s it
In this blog post, I’ll share my personal notes on setting up a kubernetes cluster using kubeadm on Ubuntu 24.04 LTS Virtual Machines.
For this setup, I will use three (3) Virtual Machines in my local lab. My home lab is built on libvirt with QEMU/KVM (Kernel-based Virtual Machine), and I use Terraform as the infrastructure provisioning tool.
Prerequisites
- at least 3 Virtual Machines of Ubuntu 24.04 (one for control-plane, two for worker nodes)
- 2GB (or more) of RAM on each Virtual Machine
- 2 CPUs (or more) on each Virtual Machine
- 20Gb of hard disk on each Virtual Machine
- No SWAP partition/image/file on each Virtual Machine
Streamline the lab environment
To simplify the Terraform code for the libvirt/QEMU Kubernetes lab, I’ve made a few adjustments so that all of the VMs use the below default values:
- ssh port: 22/TCP
- volume size: 40G
- memory: 4096
- cpu: 4
Review the values and adjust them according to your requirements and limitations.
Git Terraform Code for the kubernetes cluster
I prefer maintaining a reproducible infrastructure so that I can quickly create and destroy my test lab. My approach involves testing each step, so I often destroy everything, copy and paste commands, and move forward. I use Terraform to provision the infrastructure. You can find the full Terraform code for the Kubernetes cluster here: k8s cluster - Terraform code.
If you do not use terraform, skip this step!
You can git clone
the repo to review and edit it according to your needs.
git clone https://github.com/ebal/k8s_cluster.git
cd tf_libvirt
You will need to make appropriate changes. Open Variables.tf for that. The most important option to change, is the User option. Change it to your github username and it will download and setup the VMs with your public key, instead of mine!
But pretty much, everything else should work out of the box. Change the vmem and vcpu settings to your needs.
Initilaze the working directory
Init terraform before running the below shell script.
This action will download in your local directory all the required teffarorm providers or modules.
terraform init
Ubuntu 24.04 Image
Before proceeding with creating the VMs, we need to ensure that the Ubuntu 24.04 image is available on our system, or modify the code to download it from the internet.
In Variables.tf terraform file, you will notice the below entries
# The image source of the VM
# cloud_image = "https://cloud-images.ubuntu.com/oracular/current/focal-server-cloudimg-amd64.img"
cloud_image = "../oracular-server-cloudimg-amd64.img"
If you do not want to download the Ubuntu 24.04 cloud server image then make the below change
# The image source of the VM
cloud_image = "https://cloud-images.ubuntu.com/oracular/current/focal-server-cloudimg-amd64.img"
# cloud_image = "../oracular-server-cloudimg-amd64.img"
otherwise you need to download it, in the upper directory, to speed things up
cd ../
IMAGE="oracular" # 24.04
curl -sLO https://cloud-images.ubuntu.com/${IMAGE}/current/${IMAGE}-server-cloudimg-amd64.img
cd -
ls -l ../oracular-server-cloudimg-amd64.img
Spawn the VMs
We are ready to spawn our 3 VMs by running terraform plan
& terraform apply
./start.sh
output should be something like:
...
Apply complete! Resources: 16 added, 0 changed, 0 destroyed.
Outputs:
VMs = [
"192.168.122.223 k8scpnode1",
"192.168.122.50 k8swrknode1",
"192.168.122.10 k8swrknode2",
]
Verify that you have ssh access to the VMs
eg.
ssh ubuntu@192.168.122.223
Replace the IP with the one provided in the output.
DISCLAIMER if something failed, destroy everything with ./destroy.sh
to remove any garbages before run ./start.sh
again!!
Control-Plane Node
Let’s now begin configuring the Kubernetes control-plane node.
Ports on the control-plane node
Kubernetes runs a few services that needs to be accessable from the worker nodes.
Protocol | Direction | Port Range | Purpose | Used By |
---|---|---|---|---|
TCP | Inbound | 6443 | Kubernetes API server | All |
TCP | Inbound | 2379-2380 | etcd server client API | kube-apiserver, etcd |
TCP | Inbound | 10250 | Kubelet API | Self, Control plane |
TCP | Inbound | 10259 | kube-scheduler | Self |
TCP | Inbound | 10257 | kube-controller-manager | Self |
Although etcd ports are included in control plane section, you can also host your
own etcd cluster externally or on custom ports.
Firewall on the control-plane node
We need to open the necessary ports on the CP’s (control-plane node) firewall.
sudo ufw allow 6443/tcp
sudo ufw allow 2379:2380/tcp
sudo ufw allow 10250/tcp
sudo ufw allow 10259/tcp
sudo ufw allow 10257/tcp
# sudo ufw disable
sudo ufw status
the output should be
To Action From
-- ------ ----
22/tcp ALLOW Anywhere
6443/tcp ALLOW Anywhere
2379:2380/tcp ALLOW Anywhere
10250/tcp ALLOW Anywhere
10259/tcp ALLOW Anywhere
10257/tcp ALLOW Anywhere
22/tcp (v6) ALLOW Anywhere (v6)
6443/tcp (v6) ALLOW Anywhere (v6)
2379:2380/tcp (v6) ALLOW Anywhere (v6)
10250/tcp (v6) ALLOW Anywhere (v6)
10259/tcp (v6) ALLOW Anywhere (v6)
10257/tcp (v6) ALLOW Anywhere (v6)
Hosts file in the control-plane node
We need to update the /etc/hosts
with the internal IP and hostname.
This will help when it is time to join the worker nodes.
echo $(hostname -I) $(hostname) | sudo tee -a /etc/hosts
Just a reminder: we need to update the hosts file to all the VMs.
To include all the VMs’ IPs and hostnames.
If you already know them, then your /etc/hosts
file should look like this:
192.168.122.223 k8scpnode1
192.168.122.50 k8swrknode1
192.168.122.10 k8swrknode2
replace the IPs to yours.
Updating your hosts file
if you already the IPs of your VMs, run the below script to ALL 3 VMs
sudo tee -a /etc/hosts <<EOF
192.168.122.223 k8scpnode1
192.168.122.50 k8swrknode1
192.168.122.10 k8swrknode2
EOF
No Swap on the control-plane node
Be sure that SWAP is disabled in all virtual machines!
sudo swapoff -a
and the fstab file should not have any swap entry.
The below command should return nothing.
sudo grep -i swap /etc/fstab
If not, edit the /etc/fstab
and remove the swap entry.
If you follow my terraform k8s code example from the above github repo,
you will notice that there isn’t any swap entry in the cloud init (user-data) file.
Nevertheless it is always a good thing to douple check.
Kernel modules on the control-plane node
We need to load the below kernel modules on all k8s nodes, so k8s can create some network magic!
- overlay
- br_netfilter
Run the below bash snippet that will do that, and also will enable the forwarding features of the network.
sudo tee /etc/modules-load.d/kubernetes.conf <<EOF
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
sudo lsmod | grep netfilter
sudo tee /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
NeedRestart on the control-plane node
Before installing any software, we need to make a tiny change to needrestart program. This will help with the automation of installing packages and will stop asking -via dialog- if we would like to restart the services!
temporarily
export -p NEEDRESTART_MODE="a"
permanently
a more permanent way, is to update the configuration file
echo "$nrconf{restart} = 'a';" | sudo tee -a /etc/needrestart/needrestart.conf
Installing a Container Runtime on the control-plane node
It is time to choose which container runtime we are going to use on our k8s cluster. There are a few container runtimes for k8s and in the past docker were used to. Nowadays the most common runtime is the containerd that can also uses the cgroup v2 kernel features. There is also a docker-engine runtime via CRI. Read here for more details on the subject.
curl -sL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/docker-keyring.gpg
sudo apt-add-repository -y "deb https://download.docker.com/linux/ubuntu oracular stable"
sleep 3
sudo apt-get -y install containerd.io
containerd config default
| sed 's/SystemdCgroup = false/SystemdCgroup = true/'
| sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd.service
You can find the containerd configuration file here:
/etc/containerd/config.toml
In earlier versions of ubuntu we should enable the systemd cgroup driver
.
Recomendation from official documentation is:
It is best to use cgroup v2, use the systemd cgroup driver instead of cgroupfs.
Starting with v1.22 and later, when creating a cluster with kubeadm, if the user does not set the cgroupDriver field under KubeletConfiguration, kubeadm defaults it to systemd.
Installing kubeadm, kubelet and kubectl on the control-plane node
Install the kubernetes packages (kubedam, kubelet and kubectl) by first adding the k8s repository on our virtual machine. To speed up the next step, we will also download the configuration container images.
This guide is using kubeadm, so we need to check the latest version.
Kubernetes v1.31 is the latest version when this guide was written.
VERSION="1.31"
curl -fsSL https://pkgs.k8s.io/core:/stable:/v${VERSION}/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# allow unprivileged APT programs to read this keyring
sudo chmod 0644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${VERSION}/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# helps tools such as command-not-found to work correctly
sudo chmod 0644 /etc/apt/sources.list.d/kubernetes.list
sleep 2
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
Get kubernetes admin configuration images
Retrieve the Kubernetes admin configuration images.
sudo kubeadm config images pull
Initializing the control-plane node
We can now proceed with initializing the control-plane node for our Kubernetes cluster.
There are a few things we need to be careful about:
- We can specify the control-plane-endpoint if we are planning to have a high available k8s cluster. (we will skip this for now),
- Choose a Pod network add-on (next section) but be aware that CoreDNS (DNS and Service Discovery) will not run till then (later),
- define where is our container runtime socket (we will skip it)
- advertise the API server (we will skip it)
But we will define our Pod Network CIDR to the default value of the Pod network add-on so everything will go smoothly later on.
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
Keep the output in a notepad.
Create user access config to the k8s control-plane node
Our k8s control-plane node is running, so we need to have credentials to access it.
The kubectl reads a configuration file (that has the token), so we copying this from k8s admin.
rm -rf $HOME/.kube
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
ls -la $HOME/.kube/config
echo 'alias k="kubectl"' | sudo tee -a /etc/bash.bashrc
source /etc/bash.bashrc
Verify the control-plane node
Verify that the kubernets is running.
That means we have a k8s cluster - but only the control-plane node is running.
kubectl cluster-info
# kubectl cluster-info dump
kubectl get nodes -o wide
kubectl get pods -A -o wide
Install an overlay network provider on the control-plane node
As I mentioned above, in order to use the DNS and Service Discovery services in the kubernetes (CoreDNS) we need to install a Container Network Interface (CNI) based Pod network add-on so that your Pods can communicate with each other.
Kubernetes Flannel is a popular network overlay solution for Kubernetes clusters, primarily used to enable networking between pods across different nodes. It’s a simple and easy-to-implement network fabric that uses the VXLAN protocol to create a flat virtual network, allowing Kubernetes pods to communicate with each other across different hosts.
Make sure to open the below udp ports for flannel’s VXLAN traffic (if you are going to use it):
sudo ufw allow 8472/udp
To install Flannel as the networking solution for your Kubernetes (K8s) cluster, run the following command to deploy Flannel:
k apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
Verify CoreDNS is running on the control-plane node
Verify that the control-plane node is Up & Running and the control-plane pods (as coredns pods) are also running
k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8scpnode1 Ready control-plane 12m v1.31.3 192.168.122.223 <none> Ubuntu 24.10 6.11.0-9-generic containerd://1.7.23
k get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel kube-flannel-ds-9v8fq 1/1 Running 0 2m17s 192.168.122.223 k8scpnode1 <none> <none>
kube-system coredns-7c65d6cfc9-dg6nq 1/1 Running 0 12m 10.244.0.2 k8scpnode1 <none> <none>
kube-system coredns-7c65d6cfc9-r4ksc 1/1 Running 0 12m 10.244.0.3 k8scpnode1 <none> <none>
kube-system etcd-k8scpnode1 1/1 Running 0 13m 192.168.122.223 k8scpnode1 <none> <none>
kube-system kube-apiserver-k8scpnode1 1/1 Running 0 12m 192.168.122.223 k8scpnode1 <none> <none>
kube-system kube-controller-manager-k8scpnode1 1/1 Running 0 12m 192.168.122.223 k8scpnode1 <none> <none>
kube-system kube-proxy-sxtk9 1/1 Running 0 12m 192.168.122.223 k8scpnode1 <none> <none>
kube-system kube-scheduler-k8scpnode1 1/1 Running 0 13m 192.168.122.223 k8scpnode1 <none> <none>
That’s it with the control-plane node !
Worker Nodes
The following instructions apply similarly to both worker nodes. I will document the steps for the k8swrknode1 node, but please follow the same process for the k8swrknode2 node.
Ports on the worker nodes
As we learned above on the control-plane section, kubernetes runs a few services
Protocol | Direction | Port Range | Purpose | Used By |
---|---|---|---|---|
TCP | Inbound | 10250 | Kubelet API | Self, Control plane |
TCP | Inbound | 10256 | kube-proxy | Self, Load balancers |
TCP | Inbound | 30000-32767 | NodePort Services | All |
Firewall on the worker nodes
so we need to open the necessary ports on the worker nodes too.
sudo ufw allow 10250/tcp
sudo ufw allow 10256/tcp
sudo ufw allow 30000:32767/tcp
sudo ufw status
The output should appear as follows:
To Action From
-- ------ ----
22/tcp ALLOW Anywhere
10250/tcp ALLOW Anywhere
30000:32767/tcp ALLOW Anywhere
22/tcp (v6) ALLOW Anywhere (v6)
10250/tcp (v6) ALLOW Anywhere (v6)
30000:32767/tcp (v6) ALLOW Anywhere (v6)
and do not forget, we also need to open UDP 8472 for flannel
sudo ufw allow 8472/udp
The next few steps are pretty much exactly the same as in the control-plane node.
In order to keep this documentation short, I’ll just copy/paste the commands.
Hosts file in the worker node
Update the /etc/hosts
file to include the IPs and hostname of all VMs.
192.168.122.223 k8scpnode1
192.168.122.50 k8swrknode1
192.168.122.10 k8swrknode2
No Swap on the worker node
sudo swapoff -a
Kernel modules on the worker node
sudo tee /etc/modules-load.d/kubernetes.conf <<EOF
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
sudo lsmod | grep netfilter
sudo tee /etc/sysctl.d/kubernetes.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
NeedRestart on the worker node
export -p NEEDRESTART_MODE="a"
Installing a Container Runtime on the worker node
curl -sL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/docker-keyring.gpg
sudo apt-add-repository -y "deb https://download.docker.com/linux/ubuntu oracular stable"
sleep 3
sudo apt-get -y install containerd.io
containerd config default
| sed 's/SystemdCgroup = false/SystemdCgroup = true/'
| sudo tee /etc/containerd/config.toml
sudo systemctl restart containerd.service
Installing kubeadm, kubelet and kubectl on the worker node
VERSION="1.31"
curl -fsSL https://pkgs.k8s.io/core:/stable:/v${VERSION}/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# allow unprivileged APT programs to read this keyring
sudo chmod 0644 /etc/apt/keyrings/kubernetes-apt-keyring.gpg
# This overwrites any existing configuration in /etc/apt/sources.list.d/kubernetes.list
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${VERSION}/deb/ /" | sudo tee /etc/apt/sources.list.d/kubernetes.list
# helps tools such as command-not-found to work correctly
sudo chmod 0644 /etc/apt/sources.list.d/kubernetes.list
sleep 3
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
Get Token from the control-plane node
To join nodes to the kubernetes cluster, we need to have a couple of things.
- a token from control-plane node
- the CA certificate hash from the contol-plane node.
If you didnt keep the output the initialization of the control-plane node, that’s okay.
Run the below command in the control-plane node.
sudo kubeadm token list
and we will get the initial token that expires after 24hours.
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
7n4iwm.8xqwfcu4i1co8nof 23h 2024-11-26T12:14:55Z authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
In this case is the
7n4iwm.8xqwfcu4i1co8nof
Get Certificate Hash from the control-plane node
To get the CA certificate hash from the control-plane-node, we need to run a complicated command:
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
and in my k8s cluster is:
2f68e4b27cae2d2a6431f3da308a691d00d9ef3baa4677249e43b3100d783061
Join Workers to the kubernetes cluster
So now, we can Join our worker nodes to the kubernetes cluster.
Run the below command on both worker nodes:
sudo kubeadm join 192.168.122.223:6443
--token 7n4iwm.8xqwfcu4i1co8nof
--discovery-token-ca-cert-hash sha256:2f68e4b27cae2d2a6431f3da308a691d00d9ef3baa4677249e43b3100d783061
we get this message
Run ‘kubectl get nodes’ on the control-plane to see this node join the cluster.
Is the kubernetes cluster running ?
We can verify that
kubectl get nodes -o wide
kubectl get pods -A -o wide
All nodes have successfully joined the Kubernetes cluster
so make sure they are in Ready status.
k8scpnode1 Ready control-plane 58m v1.31.3 192.168.122.223 <none> Ubuntu 24.10 6.11.0-9-generic containerd://1.7.23
k8swrknode1 Ready <none> 3m37s v1.31.3 192.168.122.50 <none> Ubuntu 24.10 6.11.0-9-generic containerd://1.7.23
k8swrknode2 Ready <none> 3m37s v1.31.3 192.168.122.10 <none> Ubuntu 24.10 6.11.0-9-generic containerd://1.7.23
All pods
so make sure all pods are in Running status.
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel kube-flannel-ds-9v8fq 1/1 Running 0 46m 192.168.122.223 k8scpnode1 <none> <none>
kube-flannel kube-flannel-ds-hmtmv 1/1 Running 0 3m32s 192.168.122.50 k8swrknode1 <none> <none>
kube-flannel kube-flannel-ds-rwkrm 1/1 Running 0 3m33s 192.168.122.10 k8swrknode2 <none> <none>
kube-system coredns-7c65d6cfc9-dg6nq 1/1 Running 0 57m 10.244.0.2 k8scpnode1 <none> <none>
kube-system coredns-7c65d6cfc9-r4ksc 1/1 Running 0 57m 10.244.0.3 k8scpnode1 <none> <none>
kube-system etcd-k8scpnode1 1/1 Running 0 57m 192.168.122.223 k8scpnode1 <none> <none>
kube-system kube-apiserver-k8scpnode1 1/1 Running 0 57m 192.168.122.223 k8scpnode1 <none> <none>
kube-system kube-controller-manager-k8scpnode1 1/1 Running 0 57m 192.168.122.223 k8scpnode1 <none> <none>
kube-system kube-proxy-49f6q 1/1 Running 0 3m32s 192.168.122.50 k8swrknode1 <none> <none>
kube-system kube-proxy-6qpph 1/1 Running 0 3m33s 192.168.122.10 k8swrknode2 <none> <none>
kube-system kube-proxy-sxtk9 1/1 Running 0 57m 192.168.122.223 k8scpnode1 <none> <none>
kube-system kube-scheduler-k8scpnode1 1/1 Running 0 57m 192.168.122.223 k8scpnode1 <none> <none>
That’s it !
Our k8s cluster is running.
Kubernetes Dashboard
is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage applications running in the cluster and troubleshoot them, as well as manage the cluster itself.
Next, we can move forward with installing the Kubernetes dashboard on our cluster.
Helm
Helm—a package manager for Kubernetes that simplifies the process of deploying applications to a Kubernetes cluster. As of version 7.0.0, kubernetes-dashboard has dropped support for Manifest-based installation. Only Helm-based installation is supported now.
Live on the edge !
curl -sL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
Install kubernetes dashboard
We need to add the kubernetes-dashboard helm repository first and install the helm chart after:
# Add kubernetes-dashboard repository
helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/
# Deploy a Helm Release named "kubernetes-dashboard" using the kubernetes-dashboard chart
helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
The output of the command above should resemble something like this:
Release "kubernetes-dashboard" does not exist. Installing it now.
NAME: kubernetes-dashboard
LAST DEPLOYED: Mon Nov 25 15:36:51 2024
NAMESPACE: kubernetes-dashboard
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
*************************************************************************************************
*** PLEASE BE PATIENT: Kubernetes Dashboard may need a few minutes to get up and become ready ***
*************************************************************************************************
Congratulations! You have just installed Kubernetes Dashboard in your cluster.
To access Dashboard run:
kubectl -n kubernetes-dashboard port-forward svc/kubernetes-dashboard-kong-proxy 8443:443
NOTE: In case port-forward command does not work, make sure that kong service name is correct.
Check the services in Kubernetes Dashboard namespace using:
kubectl -n kubernetes-dashboard get svc
Dashboard will be available at:
https://localhost:8443
Verify the installation
kubectl -n kubernetes-dashboard get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard-api ClusterIP 10.106.254.153 <none> 8000/TCP 3m48s
kubernetes-dashboard-auth ClusterIP 10.103.156.167 <none> 8000/TCP 3m48s
kubernetes-dashboard-kong-proxy ClusterIP 10.105.230.13 <none> 443/TCP 3m48s
kubernetes-dashboard-metrics-scraper ClusterIP 10.109.7.234 <none> 8000/TCP 3m48s
kubernetes-dashboard-web ClusterIP 10.106.125.65 <none> 8000/TCP 3m48s
kubectl get all -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/kubernetes-dashboard-api-6dbb79747-rbtlc 1/1 Running 0 4m5s
pod/kubernetes-dashboard-auth-55d7cc5fbd-xccft 1/1 Running 0 4m5s
pod/kubernetes-dashboard-kong-57d45c4f69-t9lw2 1/1 Running 0 4m5s
pod/kubernetes-dashboard-metrics-scraper-df869c886-lt624 1/1 Running 0 4m5s
pod/kubernetes-dashboard-web-6ccf8d967-9rp8n 1/1 Running 0 4m5s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes-dashboard-api ClusterIP 10.106.254.153 <none> 8000/TCP 4m10s
service/kubernetes-dashboard-auth ClusterIP 10.103.156.167 <none> 8000/TCP 4m10s
service/kubernetes-dashboard-kong-proxy ClusterIP 10.105.230.13 <none> 443/TCP 4m10s
service/kubernetes-dashboard-metrics-scraper ClusterIP 10.109.7.234 <none> 8000/TCP 4m10s
service/kubernetes-dashboard-web ClusterIP 10.106.125.65 <none> 8000/TCP 4m10s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/kubernetes-dashboard-api 1/1 1 1 4m7s
deployment.apps/kubernetes-dashboard-auth 1/1 1 1 4m7s
deployment.apps/kubernetes-dashboard-kong 1/1 1 1 4m7s
deployment.apps/kubernetes-dashboard-metrics-scraper 1/1 1 1 4m7s
deployment.apps/kubernetes-dashboard-web 1/1 1 1 4m7s
NAME DESIRED CURRENT READY AGE
replicaset.apps/kubernetes-dashboard-api-6dbb79747 1 1 1 4m6s
replicaset.apps/kubernetes-dashboard-auth-55d7cc5fbd 1 1 1 4m6s
replicaset.apps/kubernetes-dashboard-kong-57d45c4f69 1 1 1 4m6s
replicaset.apps/kubernetes-dashboard-metrics-scraper-df869c886 1 1 1 4m6s
replicaset.apps/kubernetes-dashboard-web-6ccf8d967 1 1 1 4m6s
Accessing Dashboard via a NodePort
A NodePort is a type of Service in Kubernetes that exposes a service on each node’s IP at a static port. This allows external traffic to reach the service by accessing the node’s IP and port. kubernetes-dashboard by default runs on a internal 10.x.x.x IP. To access the dashboard we need to have a NodePort in the kubernetes-dashboard service.
We can either Patch the service or edit the yaml file.
Choose one of the two options below; there’s no need to run both as it’s unnecessary (but not harmful).
Patch kubernetes-dashboard
This is one way to add a NodePort.
kubectl --namespace kubernetes-dashboard patch svc kubernetes-dashboard-kong-proxy -p '{"spec": {"type": "NodePort"}}'
output
service/kubernetes-dashboard-kong-proxy patched
verify the service
kubectl get svc -n kubernetes-dashboard
output
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard-api ClusterIP 10.106.254.153 <none> 8000/TCP 50m
kubernetes-dashboard-auth ClusterIP 10.103.156.167 <none> 8000/TCP 50m
kubernetes-dashboard-kong-proxy NodePort 10.105.230.13 <none> 443:32116/TCP 50m
kubernetes-dashboard-metrics-scraper ClusterIP 10.109.7.234 <none> 8000/TCP 50m
kubernetes-dashboard-web ClusterIP 10.106.125.65 <none> 8000/TCP 50m
we can see the 32116 in the kubernetes-dashboard.
Edit kubernetes-dashboard Service
This is an alternative way to add a NodePort.
kubectl edit svc -n kubernetes-dashboard kubernetes-dashboard-kong-proxy
and chaning the service type from
type: ClusterIP
to
type: NodePort
Accessing Kubernetes Dashboard
The kubernetes-dashboard has two (2) pods, one (1) for metrics, one (2) for the dashboard.
To access the dashboard, first we need to identify in which Node is running.
kubectl get pods -n kubernetes-dashboard -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kubernetes-dashboard-api-56f6f4b478-p4xbj 1/1 Running 0 55m 10.244.2.12 k8swrknode1 <none> <none>
kubernetes-dashboard-auth-565b88d5f9-fscj9 1/1 Running 0 55m 10.244.1.12 k8swrknode2 <none> <none>
kubernetes-dashboard-kong-57d45c4f69-rts57 1/1 Running 0 55m 10.244.2.10 k8swrknode1 <none> <none>
kubernetes-dashboard-metrics-scraper-df869c886-bljqr 1/1 Running 0 55m 10.244.2.11 k8swrknode1 <none> <none>
kubernetes-dashboard-web-6ccf8d967-t6k28 1/1 Running 0 55m 10.244.1.11 k8swrknode2 <none> <none>
In my setup the dashboard pod is running on the worker node 1 and from the /etc/hosts
is on the 192.168.122.50 IP.
The NodePort is 32116
k get svc -n kubernetes-dashboard -o wide
So, we can open a new tab on our browser and type:
https://192.168.122.50:32116
and accept the self-signed certificate!
Create An Authentication Token (RBAC)
Last step for the kubernetes-dashboard is to create an authentication token.
Creating a Service Account
Create a new yaml file, with kind: ServiceAccount that has access to kubernetes-dashboard namespace and has name: admin-user.
cat > kubernetes-dashboard.ServiceAccount.yaml <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: admin-user
namespace: kubernetes-dashboard
EOF
add this service account to the k8s cluster
kubectl apply -f kubernetes-dashboard.ServiceAccount.yaml
output
serviceaccount/admin-user created
Creating a ClusterRoleBinding
We need to bind the Service Account with the kubernetes-dashboard via Role-based access control.
cat > kubernetes-dashboard.ClusterRoleBinding.yaml <<EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: admin-user
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: admin-user
namespace: kubernetes-dashboard
EOF
apply this yaml file
kubectl apply -f kubernetes-dashboard.ClusterRoleBinding.yaml
clusterrolebinding.rbac.authorization.k8s.io/admin-user created
That means, our Service Account User has all the necessary roles to access the kubernetes-dashboard.
Getting a Bearer Token
Final step is to create/get a token for our user.
kubectl -n kubernetes-dashboard create token admin-user
eyJhbGciOiJSUzI1NiIsImtpZCI6IlpLbDVPVFQxZ1pTZlFKQlFJQkR6dVdGdGpvbER1YmVmVmlJTUd5WEVfdUEifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNzMyNzI0NTQ5LCJpYXQiOjE3MzI3MjA5NDksImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwianRpIjoiMTczNzQyZGUtNDViZi00NjhkLTlhYWYtMDg3MDA3YmZmMjk3Iiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhZG1pbi11c2VyIiwidWlkIjoiYWZhZmNhYzItZDYxNy00M2I0LTg2N2MtOTVkMzk5YmQ4ZjIzIn19LCJuYmYiOjE3MzI3MjA5NDksInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbi11c2VyIn0.AlPSIrRsCW2vPa1P3aDQ21jaeIU2MAtiKcDO23zNRcd8-GbJUX_3oSInmSx9o2029eI5QxciwjduIRdJfTuhiPPypb3tp31bPT6Pk6_BgDuN7n4Ki9Y2vQypoXJcJNikjZpSUzQ9TOm88e612qfidSc88ATpfpS518IuXCswPg4WPjkI1WSPn-lpL6etrRNVfkT1eeSR0fO3SW3HIWQX9ce-64T0iwGIFjs0BmhDbBtEW7vH5h_hHYv3cbj_6yGj85Vnpjfcs9a9nXxgPrn_up7iA6lPtLMvQJ2_xvymc57aRweqsGSHjP2NWya9EF-KBy6bEOPB29LaIaKMywSuOQ
Add this token to the previous login page
Browsing Kubernetes Dashboard
eg. Cluster –> Nodes
Nginx App
Before finishing this blog post, I would also like to share how to install a simple nginx-app as it is customary to do such thing in every new k8s cluster.
But plz excuse me, I will not get into much details.
You should be able to understand the below k8s commands.
Install nginx-app
kubectl create deployment nginx-app --image=nginx --replicas=2
deployment.apps/nginx-app created
Get Deployment
kubectl get deployment nginx-app -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
nginx-app 2/2 2 2 64s nginx nginx app=nginx-app
Expose Nginx-App
kubectl expose deployment nginx-app --type=NodePort --port=80
service/nginx-app exposed
Verify Service nginx-app
kubectl get svc nginx-app -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
nginx-app NodePort 10.98.170.185 <none> 80:31761/TCP 27s app=nginx-app
Describe Service nginx-app
kubectl describe svc nginx-app
Name: nginx-app
Namespace: default
Labels: app=nginx-app
Annotations: <none>
Selector: app=nginx-app
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.98.170.185
IPs: 10.98.170.185
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 31761/TCP
Endpoints: 10.244.1.10:80,10.244.2.10:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Curl Nginx-App
curl http://192.168.122.8:31761
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Nginx-App from Browser
Change the default page
Last but not least, let’s modify the default index page to something different for educational purposes with the help of a ConfigMap
The idea is to create a ConfigMap with the html of our new index page then we would like to attach it to our nginx deployment as a volume mount !
cat > nginx_config.map << EOF
apiVersion: v1
data:
index.html: |
<!DOCTYPE html>
<html lang="en">
<head>
<title>A simple HTML document</title>
</head>
<body>
<p>Change the default nginx page </p>
</body>
</html>
kind: ConfigMap
metadata:
name: nginx-config-page
namespace: default
EOF
cat nginx_config.map
apiVersion: v1
data:
index.html: |
<!DOCTYPE html>
<html lang="en">
<head>
<title>A simple HTML document</title>
</head>
<body>
<p>Change the default nginx page </p>
</body>
</html>
kind: ConfigMap
metadata:
name: nginx-config-page
namespace: default
apply the config.map
kubectl apply -f nginx_config.map
verify
kubectl get configmap
NAME DATA AGE
kube-root-ca.crt 1 2d3h
nginx-config-page 1 16m
now the diffucult part, we need to mount our config map to the nginx deployment and to do that, we need to edit the nginx deployment.
kubectl edit deployments.apps nginx-app
rewrite spec section to include:
- the VolumeMount &
- the ConfigMap as Volume
spec:
containers:
- image: nginx
...
volumeMounts:
- mountPath: /usr/share/nginx/html
name: nginx-config
...
volumes:
- configMap:
name: nginx-config-page
name: nginx-config
After saving, the nginx deployment will be updated by it-self.
finally we can see our updated first index page:
That’s it
I hope you enjoyed this post.
-Evaggelos Balaskas
destroy our lab
./destroy.sh
...
libvirt_domain.domain-ubuntu["k8wrknode1"]: Destroying... [id=446cae2a-ce14-488f-b8e9-f44839091bce]
libvirt_domain.domain-ubuntu["k8scpnode"]: Destroying... [id=51e12abb-b14b-4ab8-b098-c1ce0b4073e3]
time_sleep.wait_for_cloud_init: Destroying... [id=2022-08-30T18:02:06Z]
libvirt_domain.domain-ubuntu["k8wrknode2"]: Destroying... [id=0767fb62-4600-4bc8-a94a-8e10c222b92e]
time_sleep.wait_for_cloud_init: Destruction complete after 0s
libvirt_domain.domain-ubuntu["k8wrknode1"]: Destruction complete after 1s
libvirt_domain.domain-ubuntu["k8scpnode"]: Destruction complete after 1s
libvirt_domain.domain-ubuntu["k8wrknode2"]: Destruction complete after 1s
libvirt_cloudinit_disk.cloud-init["k8wrknode1"]: Destroying... [id=/var/lib/libvirt/images/Jpw2Sg_cloud-init.iso;b8ddfa73-a770-46de-ad16-b0a5a08c8550]
libvirt_cloudinit_disk.cloud-init["k8wrknode2"]: Destroying... [id=/var/lib/libvirt/images/VdUklQ_cloud-init.iso;5511ed7f-a864-4d3f-985a-c4ac07eac233]
libvirt_volume.ubuntu-base["k8scpnode"]: Destroying... [id=/var/lib/libvirt/images/l5Rr1w_ubuntu-base]
libvirt_volume.ubuntu-base["k8wrknode2"]: Destroying... [id=/var/lib/libvirt/images/VdUklQ_ubuntu-base]
libvirt_cloudinit_disk.cloud-init["k8scpnode"]: Destroying... [id=/var/lib/libvirt/images/l5Rr1w_cloud-init.iso;11ef6bb7-a688-4c15-ae33-10690500705f]
libvirt_volume.ubuntu-base["k8wrknode1"]: Destroying... [id=/var/lib/libvirt/images/Jpw2Sg_ubuntu-base]
libvirt_cloudinit_disk.cloud-init["k8wrknode1"]: Destruction complete after 1s
libvirt_volume.ubuntu-base["k8wrknode2"]: Destruction complete after 1s
libvirt_cloudinit_disk.cloud-init["k8scpnode"]: Destruction complete after 1s
libvirt_cloudinit_disk.cloud-init["k8wrknode2"]: Destruction complete after 1s
libvirt_volume.ubuntu-base["k8wrknode1"]: Destruction complete after 1s
libvirt_volume.ubuntu-base["k8scpnode"]: Destruction complete after 2s
libvirt_volume.ubuntu-vol["k8wrknode1"]: Destroying... [id=/var/lib/libvirt/images/Jpw2Sg_ubuntu-vol]
libvirt_volume.ubuntu-vol["k8scpnode"]: Destroying... [id=/var/lib/libvirt/images/l5Rr1w_ubuntu-vol]
libvirt_volume.ubuntu-vol["k8wrknode2"]: Destroying... [id=/var/lib/libvirt/images/VdUklQ_ubuntu-vol]
libvirt_volume.ubuntu-vol["k8scpnode"]: Destruction complete after 0s
libvirt_volume.ubuntu-vol["k8wrknode2"]: Destruction complete after 0s
libvirt_volume.ubuntu-vol["k8wrknode1"]: Destruction complete after 0s
random_id.id["k8scpnode"]: Destroying... [id=l5Rr1w]
random_id.id["k8wrknode2"]: Destroying... [id=VdUklQ]
random_id.id["k8wrknode1"]: Destroying... [id=Jpw2Sg]
random_id.id["k8wrknode2"]: Destruction complete after 0s
random_id.id["k8scpnode"]: Destruction complete after 0s
random_id.id["k8wrknode1"]: Destruction complete after 0s
Destroy complete! Resources: 16 destroyed.
Friday, 08 November 2024
KDE Gear 24.12 branches created
Make sure you commit anything you want to end up in the KDE Gear 24.12
releases to them
Next Dates:
- November 14, 2024: 24.12 freeze and beta (24.11.80) tagging and release
- November 28, 2024: 24.12 RC (24.11.90) tagging and release
- December 5, 2024: 24.12 tagging
- December 12, 2024: 24.12 release
Thursday, 07 November 2024
INWX DNS Recordmaster - Manage your DNS nameserver records via files in Git
I own and manage 30+ domains at INWX, a large and professional domain registrar. Although INWX has a somewhat decent web interface, it became a burden for me to keep an overview of each domain’s sometimes dozens of records. Especially when e.g. changing an IP address for more than one domain, it caused multiple error-prone clicks and copy/pastes that couldn’t be reverted in the worst case. This is why I created INWX DNS Recordmaster which I will shortly present here.
If you are an INWX customer, you can use this tool to manage all your DNS records in YAML files. Ideally, you will store these files in a Git repository which you can use to track changes and roll back in case of a mistake. Having one file per domain provides you a number of further advantages:
- You can easily copy/paste records from other domains, e.g. for
SPF
,DKIM
orNS
records - Overall search/replace of certain values becomes much easier, e.g. of IP addresses
- You can prepare larger changes offline and can synchronise once you feel it’s done
INWX DNS Recordmaster takes care of making the required changes of the live records so that it matches the local state. This is done via the INWX API, ensuring that the amount of API calls is minimal.
This even allows you to set up a pipeline that takes care of the synchronisation1.
Wait, there is more
As written above, I already had a large stack of domains that I previously managed via the web interface. This is why some additional convenience features found their way into the tool.
- You can convert all records of an existing and already configured domain at INWX into the file format. This made onboarding my 30+ domains a matter of a few minutes.
- On a global or per-domain level, you can ignore certain record types. For example, if you don’t want to touch any
NS
records, you can configure that. By default,SOA
records are ignored. You may even ignore all live records that don’t exist in your local configuration. - Of course, you can make a dry run to see which effects your configuration will have in practice.
Did I miss something to make it more productive for you? Let me know!
Install, use, contribute
You are welcome to install this tool, it’s Free and Open Source Software after all. All you need is Python installed.
One of the tool’s users is the OpenRail Association which manages some of its domains with this program and published its configuration. This is a prime example of how organisation can make the management of records transparent and easy to change at least internally, if not even externally.
While the tool is not perfect, it already is a huge gain for efficiency and stability of my IT operations, and it already proves its capabilities for other users. To reach the remaining 20% to perfection (that will take 80% of the time, as always), you are most welcome to add issues with enhancement proposals, and if possible, also pull requests.
-
For example, see the workflow file of the OpenRail Association. ↩︎
Tuesday, 05 November 2024
Music production with Linux: How to use Guitarix and Ardour together
Music production for guitar has a lot of options on Linux. We will see how to install the required software, and how to use Guitarix together with Ardour either with the standalone version of Guitarix or with an embedded version inside Ardour.
Software installation and configuration
Install Ardour, a music production software under the GPLv2 license. For Archlinux run:
sudo pacman -S ardour
For other operating systems you can follow the Ardour installation page or on flathub.
Install qpwgraph to visualize pipewire connections. So this is not mandatory but highly recommended to make sure Ardour, Guitarix and their respective inputs and outputs are wired correctly.
sudo pacman -S qpwgraph
Make sure your user is in the audio and realtime groups:
sudo usermod -a -G audio $USER
sudo usermod -a -G realtime $USER
and set the real time priority and memory of the audio group in /etc/security/limits.d/audio.conf
:
@audio - rtprio 95
@audio - memlock unlimited
Start Ardour, select “Recording Session” and select only one audio input.
Guitarix as a standalone program
We will first see how to use Guitarix as a standalone program. Guitarix is a virtual amplifier released under the GPLv2 license which uses Jack to add audio effects to a raw guitar signal from a microphone or guitar pickup.
To install Guitarix on Archlinux run:
sudo pacman -S guitarix
Other installation instructions are available on the Guitarix installation page or on flathub.
Starting Guitarix shows the main window. The left panels shows the effects available, which can be dragged onto the main panel to put them on the rack and change their settings.
To configure Guitarix’s input and output, go to the “Engine” menu and click on “Jack Ports”. The inputs should be the guitar pickup and microphone, and the output should be Ardour “audio_in”. Make sure Ardour is started so that it can be selected in the output section.
The Guitarix output configuration can be checked on the Ardour side as well. In
Ardour, select the “Rec” tab (with the button in the top right corner) and
choose the routing grid option using the third button of the “Audio 1” row. This
will display a routing grid where you can check whether only the output of Guitarix
gx_head_fx
is selected.
The jack graph of this setup will see the guitar pickup or microphone connected
to Guitarix, the Guitarix output connected to Ardour, and the Ardour output
connected to the system’s playback. The graph from qpwgraph
below illustrates this
configuration and allows checking for feedback loops and incorrect connections.
To record the Ardour output, press the red recording button in “Audio 1” row of the “Rec” tab. To monitor the audio that will be recorded (i.e. the Guitarix output), you can press the “In” button.
Guitarix supports Neural Amp Modeler (NAM) plugins to emulate any hardware amplifier, pedal or impulse responses. NAM models can be downloaded on ToneHunt and loaded under the “Neural” section in the pool tab.
Guitarix as a plugin inside Ardour
Guitarix exists as a VST3 plugin for music production software. The plugin shares its configuration with the standalone Guitarix app, so Guitarix presets and settings from the standalone app are available in the plugin.
Install the plugin on Archlinux from AUR:
paru -S guitarix.vst
or head to the project repository for builds for other operating systems.
To load Guitarix as an Ardour plugin, go to the “Mix” tab (in to top right corner), then right-click on the black area below the fader and select “New Plugin” and “Plugin Selector”. The “Guitarix” plugin can be inserted on the newly opened window. Double-clicking on Guitarix open the plugin window, which roughly looks like the standalone program. Effects can be added using the “plus” symbol next to the input and AMP stack boxes. Community-made presets can also be downloaded using the “Online” button.
If Guitarix is used within Ardour as a plugin, the Ardour input (i.e. in this example the microphone) must be selected in the Routing grid of the audio track. The jack graph of this setup looks simpler, as the microphone is directly connected to the Ardour audio track.
Record and export the recordings
To do recordings, go the “Rec” tab and make sure the audio track has the red “record” button checked. Then go to the “Edit” tab, click on the global “Toggle record” button, hit “Play from playhead” and there goes the music!
To export the recordings, go to the “session” menu and go to “Export” and “Export to file”. On the Export dialog, select the right file format, time span and channels and click on export.
Friday, 18 October 2024
KDE Gear 24.12 release schedule
This is the release schedule the release team agreed on
https://community.kde.org/Schedules/KDE_Gear_24.12_Schedule
Dependency freeze is in around 3 weeks (November 7) and feature freeze one
after that. Get your stuff ready!
Monday, 07 October 2024
Google Summer of Code Mentor Summit 2024
This weekend "The KDE Alberts"[1] attended Google Summer of Code Mentor Summit 2024 in Sunnyvale, California.
The Google Summer of Code Mentor Summit is an annual
unconference that every project participating in Google Summer of Code
2024 is invited to attend. This year it was the 20th year celebration of the program!
I was too late to take a picture of the full cake!
We attended many sessions ranging from how to try to avoid falling into the "xz problem" to collecting donations or shaping the governance of open source projects.
We met lots of people that knew what KDE was and were happy to congratulate us on the job done and also a few that did not know KDE and were happy to learn about what we do.
We also did a quick lightning talk about the GSOC projects KDE mentored this year and led two sessions: one centered around the problems some open source application developers are having publishing to the Google Play Store and another session about Desktop Linux together with our Gnome friends.
All in all a very productive unconference. We encourage KDE mentors to take the opportunity to attend the Google Summer of Code Mentor Summit next year, it's a great experience!
[1] me and Albert Vaca, people were moderately amused that both of us had the same name, contribute to the same community and are from the same city.
Wednesday, 02 October 2024
SSH Hardening Ubuntu 24.04 LTS
Personal notes on hardening an new ubuntu 24.04 LTS ssh daemon setup for incoming ssh traffic.
Port <12345>
PasswordAuthentication no
KbdInteractiveAuthentication no
UsePAM yes
X11Forwarding no
PrintMotd no
UseDNS no
KexAlgorithms sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
HostKeyAlgorithms ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,sk-ssh-ed25519-cert-v01@openssh.com,sk-ecdsa-sha2-nistp256-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,ssh-ed25519,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com,rsa-sha2-512,rsa-sha2-256
MACs umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512
AcceptEnv LANG LC_*
AllowUsers <username>
Subsystem sftp /usr/lib/openssh/sftp-server
testing with https://sshcheck.com/
Friday, 13 September 2024
Disable the Plasma Morphing Popups effect (at least on X11)
If you're using Plasma/KWin 6 i suggest you disable the Morphing Popups effect, it has been removed for Plasma 6.2 https://invent.kde.org/plasma/kwin/-/commit/d6360cc4ce4e0d85862a4bb077b8b3dc55cd74a7 and on X11 at least it causes severe redraw issues with tooltips in Okular (and i would guess elsewhere).
Thursday, 05 September 2024
Configuring a Program’s Environment
Although there isn’t much to report of late, I thought that it might be appropriate to note a few small developments in my efforts related to L4Re. With travel, distractions, and various irritations intervening, only slow, steady progress was made during August.
Previously, I published a rather long article about operating systems and application environments, but this was not written spontaneously. In fact, it attempts to summarise various perspectives on such topics from the last fifty or so years, discovered as I reviewed the rather plentiful literature that is now readily accessible online. Alongside the distraction of reading historical documents, I had been slowly developing support for running programs in my L4Re-based environment, gradually bringing it to a point where I might be able to explore some more interesting topics.
One topic that overlapped with my last article and various conference talks was that of customising the view of the system a given program might have when it is run. Previous efforts had allowed me to demonstrate programs running and interacting with a filesystem, even one stored on a device such as a microSD card and accessed by hardware booting into L4Re, as opposed to residing in some memory in a QEMU virtual machine. And these programs were themselves granted the privilege of running their own programs. However, all of these programs resided in the same filesystem and also accessed this same filesystem.
Distinct Program Filesystems
What I wanted to do was to allow programs to see a different, customised filesystem instead of the main filesystem. Fortunately, my component architecture largely supported such a plan. When programs are invoked, the process server component supplies a filesystem reference to the newly invoked program, this reference having been the same one that the process server uses itself. To allow the program to see a different filesystem, all that is required is a reference to another filesystem be supplied.
So, the ability is required to configure the process server to utilise a distinct filesystem for invoked programs. After enhancing the process server to propagate a distinct filesystem to created processes, I updated its configuration in the Lua script within L4Re as follows:
l:startv({ caps = { fsserver = ext2server_paulb, -- this is the filesystem the server uses itself pipeserver = pipe_server, prfsserver = ext2server_nested_paulb, -- this is the distinct filesystem for programs prserver = process_server:svr(), }, log = { "process", "y" }, }, "rom/process_server", "bin/exec_region_mapper", "prfsserver");
Now, the process server obtains the program or process filesystem from the “prfsserver” capability defined in its environment. This capability or reference can be supplied to each new process created when invoking a program.
Nesting Filesystems
Of course, testing this requires a separate filesystem image to be created and somehow supplied during the initialisation of the system. When prototyping using QEMU on a machine with substantial quantities of memory, it is convenient to just bundle such images up in the payload that is deployed within QEMU, these being exposed as files in a “rom” filesystem by the core L4Re components.
But on “real hardware”, it isn’t necessarily convenient to have separate partitions on a storage device for lots of different filesystems. Instead, we might wish to host filesystem images within the main filesystem, accessing these in a fashion similar to using the loop option with the mount command on Unix-like systems. As in, something like this, mounting “filesystem.fs” at the indicated “mountpoint” location:
mount -o loop filesystem.fs mountpoint
This led to me implementing support for accessing a filesystem stored in a file within a filesystem. In the L4Re build system, my software constructs filesystem images using a simple tool that utilises libext2fs to create an ext2-based filesystem. So, I might have a directory called “docs” containing some documents that is then packed up into a filesystem image called “docs.fs”.
This image might then be placed in a directory that, amongst other content, is packed up into the main filesystem image deployed in the QEMU payload. On “real hardware”, I could take advantage of an existing filesystem on a memory card, copying content there instead of creating an image for the main filesystem. But regardless of the approach, the result would be something like this:
> ls fs fs drwxrwxrwx- 1000 1000 1024 2 . drwxr-xr-x- 0 0 1024 7 .. -rw-r--r--- 1000 1000 102400 1 docs.fs
Here, “docs.fs” resides inside the “fs” directory provided by the main filesystem.
Files Providing Filesystems
With this embedded filesystem now made available, the matter of providing support for programs to access it largely involved the introduction of a new component acting as a block device. But instead of accessing something like a memory card (or an approximation of one for the purposes of prototyping), this block server accesses a file containing an embedded filesystem though an appropriate filesystem “client” programming interface. Here is the block server being started in the Lua script:
l:startv({ caps = { blockserver = client_server:svr(), fsserver = ext2server_paulb, }, log = { "clntsvr", "y" }, }, -- program, block server capability to provide, memory pages "rom/client_server", "blockserver", "10");
Then, a filesystem server is configured using the block server defined above, obtaining the nested filesystem from “fs/docs.fs” in the main filesystem to use as its block storage medium:
l:startv({ caps = { blockserver = client_server, fsserver = ext2server_nested:svr(), pipeserver = pipe_server, }, log = { "ext2svrN", "y" }, }, -- program, server capability, memory pages, filesystem capability to provide "rom/ext2_server", "blockserver", "fs/docs.fs", "20", "fsserver");
Then, this filesystem server, utilising libext2fs coupled with a driver for a block device, can operate on the filesystem oblivious to what is providing it, which is another component that itself uses libext2fs! Thus, a chain of components can be employed to provide access to files within filesystems, themselves provided by files within other filesystems, and so on, eventually accessing blocks in some kind of storage device. Here, we will satisfy ourselves with just a single level of filesystems within files, however.
So, with the ability to choose a filesystem for new programs and with the ability to acquire a filesystem from the surrounding, main filesystem, it became possible to run a program that now sees a distinct filesystem. For example:
> run bin/ls drwxr-xr-x- 0 0 1024 4 . drwxr-xr-x- 0 0 1024 4 .. drwx------- 0 0 12288 2 lost+found drwxrwxrwx- 1000 1000 1024 2 docs [0] Completed with signal 0 value 0
Although a program only sees its own filesystem, it can itself run another program provided from outside. For example, getting “test_systemv” to run “cat”:
> run bin/test_systemv bin/cat docs/COPYING.txt Running: bin/cat Licence Agreement ----------------- All original work in this distribution is covered by the following copyright and licensing information:
Now, this seems counterintuitive. How does the program invoked from the simple shell environment, “test_systemv”, manage to invoke a program from a directory, “bin”, that is not visible and presumably not accessible to it? This can be explained by the process server. Since the invoked programs are also given a reference to the process server, this letting them start other programs, and since the process server is able to locate programs independently, the invoked programs may supply a program path that may not be accessible to them, but it may be accessible to the process server.
The result is like having some kind of “shadow” filesystem. Programs may be provided by this filesystem and run, but in this arrangement, they may only operate on a distinct filesystem where themselves and other programs may not even be present. Conversely, even if programs are provided in the filesystem visible to a program, they may not be run because the process server may not have access to them. If we wanted to provide an indication of the available programs, we might provide a “bin” directory in each program’s visible filesystem containing files with the names of the available programs, but these files would not need to be the actual programs and “running” them would not actually be running them at all: the shadow filesystem programs would be run instead.
Such trickery is not mandatory, of course. The same filesystem can be visible to programs and the process server that invoked them. But this kind of filesystem shadowing does open up some possibilities that would not normally be available in a conventional environment. Certainly, I imagine that such support could be introduced to everybody’s own favourite operating system, too, but the attraction here is that such experimentation comes at a relatively low level of effort. Moreover, I am not making anyone uncomfortable modifying another system, treading on people’s toes, theatening anyone’s position in the social hierarchy, and generally getting them on the defensive, inviting the inevitable, disrespectful question: “What is it you are trying to do?”
As I noted last time, there isn’t a singular objective here. Instead, the aim is to provide the basis for multiple outcomes, hopefully informative and useful ones. So, in keeping with that agenda, I hope that this update was worth reading.
Wednesday, 28 August 2024
Postfix Hardening Ubuntu 24.04 LTS
Personal notes on hardening an new ubuntu 24.04 LTS postfix setup for incoming smtp TLS traffic.
Create a Diffie–Hellman key exchange
openssl dhparam -out /etc/postfix/dh2048.pem 2048
for offering a new random DH group.
SMTPD - Incoming Traffic
# SMTPD - Incoming Traffic
postscreen_dnsbl_action = drop
postscreen_dnsbl_sites =
bl.spamcop.net,
zen.spamhaus.org
smtpd_banner = <put your banner here>
smtpd_helo_required = yes
smtpd_starttls_timeout = 30s
smtpd_tls_CApath = /etc/ssl/certs
smtpd_tls_cert_file = /root/.acme.sh/<your_domain>/fullchain.cer
smtpd_tls_key_file = /root/.acme.sh/<your_domain>/<your_domain>.key
smtpd_tls_dh1024_param_file = ${config_directory}/dh2048.pem
smtpd_tls_ciphers = HIGH
# Wick ciphers
smtpd_tls_exclude_ciphers =
3DES,
AES128-GCM-SHA256,
AES128-SHA,
AES128-SHA256,
AES256-GCM-SHA384,
AES256-SHA,
AES256-SHA256,
CAMELLIA128-SHA,
CAMELLIA256-SHA,
DES-CBC3-SHA,
DHE-RSA-DES-CBC3-SHA,
aNULL,
eNULL,
CBC
smtpd_tls_loglevel = 1
smtpd_tls_mandatory_ciphers = HIGH
smtpd_tls_protocols = !SSLv2, !SSLv3, !TLSv1, !TLSv1.1
smtpd_tls_security_level = may
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtpd_use_tls = yes
tls_preempt_cipherlist = yes
unknown_local_recipient_reject_code = 550
Local Testing
testssl -t smtp <your_domain>.:25
Online Testing
result
Thursday, 08 August 2024
Install tailscale to very old linux systems with init script
I have many random VPS and VMs across europe in different providers for reasons.
Two of them, are still running rpm based distro from 2011 and yes 13years later, I have not found the time to migrate them! Needless to say these are still my most stable running linux machines that I have, zero problems, ZERO PROBLEMS and are in production and heavily used every day. Let me write this again in bold: ZERO PROBLEMS.
But as time has come, I want to close some public services and use a mesh VPN for ssh. Tailscale entered the conversation and seems it’s binary works in new and old linux machines too.
long story short, I wanted an init script and with the debian package: dpkg, I could use start-stop-daemon.
Here is the init script:
#!/bin/bash
# ebal, Thu, 08 Aug 2024 14:18:11 +0300
### BEGIN INIT INFO
# Provides: tailscaled
# Required-Start: $local_fs $network $syslog
# Required-Stop: $local_fs $network $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: tailscaled daemon
# Description: tailscaled daemon
### END INIT INFO
. /etc/rc.d/init.d/functions
prog="tailscaled"
DAEMON="/usr/local/bin/tailscaled"
PIDFILE="/var/run/tailscaled.pid"
test -x $DAEMON || exit 0
case "$1" in
start)
echo "Starting ${prog} ..."
start-stop-daemon --start --background --pidfile $PIDFILE --make-pidfile --startas $DAEMON --
RETVAL=$?
;;
stop)
echo "Stopping ${prog} ..."
if [ -f ${PIDFILE} ]; then
start-stop-daemon --stop --pidfile $PIDFILE --retry 5 --startas ${DAEMON} -- -cleanup
rm -f ${PIDFILE} > /dev/null 2>&1
fi
RETVAL=$?
;;
status)
start-stop-daemon --status --pidfile ${PIDFILE}
status $prog
RETVAL=$?
;;
*)
echo "Usage: /etc/init.d/tailscaled {start|stop|status}"
RETVAL=1
;;
esac
exit ${RETVAL}
an example:
[root@kvm ~]# /etc/init.d/tailscaled start
Starting tailscaled ...
[root@kvm ~]# /etc/init.d/tailscaled status
tailscaled (pid 29101) is running...
[root@kvm ~]# find /var/ -type f -name "tailscale*pid"
/var/run/tailscaled.pid
[root@kvm ~]# cat /var/run/tailscaled.pid
29101
[root@kvm ~]# ps -e fuwww | grep -i tailscaled
root 29400 0.0 0.0 103320 880 pts/0 S+ 16:49 0:00 _ grep --color -i tailscaled
root 29101 2.0 0.7 1250440 32180 ? Sl 16:48 0:00 /usr/local/bin/tailscaled
[root@kvm ~]# tailscale up
[root@kvm ~]# tailscale set -ssh
[root@kvm ~]# /etc/init.d/tailscaled stop
Stopping tailscaled ...
[root@kvm ~]# /etc/init.d/tailscaled status
tailscaled is stopped
[root@kvm ~]# /etc/init.d/tailscaled stop
Stopping tailscaled ...
[root@kvm ~]# /etc/init.d/tailscaled start
Starting tailscaled ...
[root@kvm ~]# /etc/init.d/tailscaled start
Starting tailscaled ...
process already running.
[root@kvm ~]# /etc/init.d/tailscaled status
tailscaled (pid 29552) is running...
Saturday, 27 July 2024
Reformulating the Operating System
As noted previously, two of my interests in recent times have been computing history and microkernel-based operating systems. Having perused academic and commercial literature in the computing field a fair amount over the last few years, I experienced some feelings of familiarity when looking at the schedule for FOSDEM, which took place earlier in the year, brought about when encountering a talk in the “microkernel and component-based OS” developer room: “A microkernel-based orchestrator for distributed Internet services?”
In this talk’s abstract, mentions of the complexity of current Linux-based container solutions led me to consider the role of containers and virtual machines. In doing so, it brought back a recollection of a paper published in 1996, “Microkernels Meet Recursive Virtual Machines”, describing a microkernel-based system architecture called Fluke. When that paper was published, I was just starting out in my career and preoccupied with other things. It was only in pursuing those interests of mine that it came to my attention more recently.
It turned out that there were others at FOSDEM with similar concerns. Liam Proven, who regularly writes about computing history and alternative operating systems, gave a talk, “One way forward: finding a path to what comes after Unix”, that combined observations about the state of the computing industry, the evolution of Unix, and the possibilities of revisiting systems such as Plan 9 to better inform current and future development paths. This talk has since been summarised in four articles, concluding with “A path out of bloat: A Linux built for VMs” that links back to the earlier parts.
Both of these talks noted that in attempting to deploy applications and services, typically for Internet use, practitioners are now having to put down new layers of functionality to mitigate or work around limitations in existing layers. In other words, they start out with an operating system, typically based on Linux, that provides a range of features including support for multiple users and the ability to run software in an environment largely confined to the purview of each user, but end up discarding most of this built-in support as they bundle up their software within such things as containers or virtual machines, where the software can pretend that it has access to a complete environment, often running under the control of one or more specific user identities within that environment.
With all this going on, people should be questioning why they need to put one bundle of software (their applications) inside another substantial bundle of software (an operating system running in a container or virtual machine), only to deploy that inside yet another substantial bundle of software (an operating system running on actual hardware). Computing resources may be the cheapest they have ever been, supply chain fluctuations notwithstanding, but there are plenty of other concerns about building up levels of complexity in systems that should prevent us from using cheap computing as an excuse for business as usual.
A Quick Historical Review
In the early years of electronic computing, each machine would be dedicated to running a single program uninterrupted until completion, producing its results and then being set up for the execution of a new program. In this era, one could presumably regard a computer simply as the means to perform a given computation, hence the name.
However, as technology progressed, it became apparent that dedicating a machine to a single program in this way utilised computing resources inefficiently. When programs needed to access relatively slow peripheral devices such as reading data from, or writing data to, storage devices, the instruction processing unit would be left idle for significant amounts of cumulative time. Thus, solutions were developed to allow multiple programs to reside in the machine at the same time. If a running program had paused to allow data to transferred to or from storage, another program might have been given a chance to run until it also found itself needing to wait for those peripherals.
In such systems, each program can no longer truly consider itself as the sole occupant or user of the machine. However, there is an attraction in allowing programs to be written in such a way that they might be able to ignore or overlook this need to share a computer with other programs. Thus, the notion of a more abstract computing environment begins to take shape: a program may believe that it is accessing a particular device, but the underlying machine operating software might direct the program’s requests to a device of its own choosing, presenting an illusion to the program.
Although these large, expensive computer systems then evolved to provide “multiprogramming” support, multitasking, virtual memory, and virtual machine environments, it is worth recalling the evolution of computers at the other end of the price and size scale, starting with the emergence of microcomputers from the 1970s onwards. Constrained by the availability of affordable semiconductor components, these small systems at first tended to limit themselves to modest computational activities, running one program at a time, perhaps punctuated occasionally by interrupts allowing the machine operating software to update the display or perform other housekeeping tasks.
As microcomputers became more sophisticated, so expectations of the functionality they might deliver also became more sophisticated. Users of many of the earlier microcomputers might have run one application or environment at a time, such as a BASIC interpreter, a game, or a word processor, and what passed for an operating system would often only really permit a single application to be active at once. A notable exception in the early 1980s was Microware’s OS-9, which sought to replicate the Unix environment within the confines of 8-bit microcomputer architecture, later ported to the Motorola 68000 and used in, amongst other things, Philips’ CD-i players.
OS-9 offered the promise of something like Unix on fairly affordable hardware, but users of systems with more pedestrian software also started to see the need for capabilities like multitasking. Even though the dominant model of microcomputing, perpetuated by the likes of MS-DOS, had involved running one application to do something, then exiting that application and running another, it quickly became apparent that users themselves had multitasking impulses and were inconvenienced by having to finish off something, even temporarily, switch to another application offering different facilities, and then switch back again to resume their work.
Thus, the TSR and the desk accessory were born, even finding a place on systems like the Apple Macintosh, whose user interface gave the impression of multitasking functionality and allowed switching between applications, even though only a single application could, in general, run at a time. Later, Apple introduced MultiFinder with the more limited cooperative flavour of multitasking, in contrast to systems already offering preemptive multitasking of applications in their graphical environments. People may feel the compulsion to mention the Commodore Amiga in such contexts, but a slightly more familiar system from a modern perspective would be the Torch Triple X workstation with its OpenTop graphical environment running on top of Unix.
The Language System Phenomenon
And so, the upper and lower ends of the computing market converged on expectations that users might be able to run many programs at a time within their computers. But the character of these expectations might have been coloured differently from the prior experiences of each group. Traditional computer users might well have framed the environment of their programs in terms of earlier machines and environments, regarding multitasking as a convenience but valuing compatibility above all else.
At the lower end of the market, however, users were looking to embrace higher-level languages such as Pascal and Modula-2, these being cumbersome on early microprocessor systems but gradually becoming more accessible with the introduction of later systems with more memory, disk storage and processors more amenable to running such languages. Indeed, the notion of the language environment emerged, such as UCSD Pascal, accompanied by the portable code environment, such as the p-System hosting the UCSD Pascal environment, emphasising portability and defining a machine detached from the underlying hardware implementation.
Although the p-System could host other languages, it became closely associated with Pascal, largely by being the means through which Pascal could be propagated to different computer systems. While 8-bit microcomputers like the BBC Micro struggled with something as sophisticated as the p-System, even when enhanced with a second processor and more memory, more powerful machines could more readily bear the weight of the p-System, even prompting some to suggest at one time that it was “becoming the de facto standard operating system on the 68000”, supplied as standard on 68000-based machines like the Sage II and Sage IV.
Such language environments became prominent for a while, Lisp and Smalltalk being particularly fashionable, and with the emergence of the workstation concept, new and divergent paths were forged for a while. Liam Proven previously presented Wirth’s Oberon system as an example of a concise, efficient, coherent environment that might still inform the technological direction we might wish to take today. Although potentially liberating, such environments were also constraining in that their technological homogeneity – the imposition of a particular language or runtime – tended to exclude applications that users might have wanted to run. And although Pascal, Oberon, Lisp or Smalltalk might have their adherents, they do not all appeal to everyone.
Indeed, during the 1980s and even today, applications sell systems. There are plenty of cases where manufacturers ploughed their own furrow, believing that customers would see the merits in their particular set of technologies and be persuaded into adopting those instead of deploying the products they had in mind, only to see the customers choose platforms that supported the products and technologies that they really wanted. Sometimes, vendors doubled down on customisations to their platforms, touting the benefits of custom microcode to run particular programs or environments, ignoring that customers often wanted more generally useful solutions, not specialised products that would become uncompetitive and obsolete as technology more broadly progressed.
For all their elegance, language-oriented environments risked becoming isolated enclaves appealing only to their existing users: an audience who might forgive and even defend the deficiencies of their chosen systems. For example, image-based persistence, where software could be developed in a live environment and “persisted” or captured in an image or “world” for later use or deployment, remains a tantalising approach to software development that sometimes appeals to outsiders, but one can argue that it also brings risks in terms of reproducibility around software development and deployment.
If this sounds familiar to anyone old enough to remember the end of the 1990s and the early years of this century, probing this familiarity may bring to mind the Java bandwagon that rolled across the industry. This caused companies to revamp their product lines, researchers to shelve their existing projects, developers to encounter hostility towards the dependable technologies they were already using, and users to suffer the mediocre applications and user interfaces that all of this upheaval brought with it.
Interesting research, such as that around Fluke and similar projects, was seemingly deprioritised in favour of efforts that presumably attempted to demonstrate “research relevance” in the face of this emerging, everything-in-Java paradigm with its “religious overtones”. And yet, commercial application of supposedly viable “pure Java” environments struggled in the face of abysmal performance and usability.
The Nature of the Machine
Users do apparently value heterogeneity or diversity in their computing environments, to be able to mix and match their chosen applications, components and technologies. Today’s mass-market computers may have evolved from the microcomputers of earlier times, accumulating workstation, minicomputer and mainframe technologies along the way, and they may have incorporated largely sensible solutions in doing so, but it can still be worthwhile reviewing how high-end systems of earlier times addressed issues of deploying different kinds of functionality safely within the same system.
When “multiprogramming” became an essential part of most system vendors’ portfolios, the notion of a “virtual machine” emerged, this being the vehicle through which a user’s programs could operate or experience the machine while sharing it with other programs. Today, using our minicomputer or Unix-inspired operating systems, we think of a virtual machine as something rather substantial, potentially simulating an entire system with all its peculiarities, but other interpretations of the term were once in common circulation.
In the era when the mainframe reigned supreme, their vendors differed in their definitions of a virtual machine. International Computers Limited (ICL) revamped their product range in the 1970s in an attempt to compete with IBM, introducing their VME or Virtual Machine Environment operating system to run on their 2900 series computers. Perusing the literature related to VME reveals a system that emphasises different concepts to those we might recognise from Unix, even though there are also many similarities that are perhaps obscured by differences in terminology. Where we are able to contrast the different worlds of VME and Unix, however, is in the way that ICL chose to provide a Unix environment for VME.
As the end of the 1980s approached, once dominant suppliers with their closed software and solution ecosystems started to get awkward questions about Unix and “open systems”. The less well-advised, like Norway’s rising star, Norsk Data, refused to seriously engage with such trends, believing their own mythology of technological superiority, until it was too late to convince their customers switching to other platforms that they had suddenly realised that this Unix thing was worthwhile after all. ICL, meanwhile, only tentatively delivered a Unix solution for their top-of-the-line systems.
Six years after ICL’s Series 39 mainframe range was released, and after years of making a prior solution selectively available, ICL’s VME/X product was delivered, offering a hosted Unix environment within VME, broadly comparable with Amdahl’s UTS and IBM’s IX/370. Eventually, VME/X was rolled into OpenVME, acknowledging “open systems” rather like Digital’s OpenVMS, all without actually being open, as one of my fellow students once joked. Nevertheless, VME/X offers an insight into what a virtual machine is in VME and how ICL managed to map Unix concepts into VME.
Reading VME documentation, one gets the impression that, fundamentally, a virtual machine in the VME sense is really about giving an environment to a particular user, as opposed to a particular program. Each environment has its own private memory regions, inaccessible to other virtual machines, along with other regions that may be shared between virtual machines. Within each environment, a number of processes can be present, but unlike Unix processes, these are simply execution contexts or, in Unix and more general terms, threads.
Since the process is the principal abstraction in Unix through which memory is partitioned, it is curious that in VME/X, the choice was made to not map Unix processes to VME virtual machines. Instead, each “terminal user”, each “batch job” (not exactly a Unix concept), as well as “certain daemons” were given their own virtual machines. And when creating a new Unix process, instead of creating a new virtual machine, VME/X would in general create a new VME process, seemingly allowing each user’s processes to reside within the same environment and to potentially access each other’s memory. Only when privilege or user considerations applied, would a new process be initiated in a new virtual machine.
Stranger than this, however, is VME’s apparent inability to run multiple processes concurrently within the same virtual machine, even on multiprocessor systems, although processes in different virtual machines could run concurrently. For one process to suspend execution and yield to another in the same virtual machine, a special “process-switching call” instruction was apparently needed, providing a mechanism like that of green threads or fibers in other systems. However, I could imagine that this could have provided a mechanism for concealing each process’s memory regions from others by using this call to initiate a reconfiguration of the memory segments available in the virtual machine.
I have not studied earlier ICL systems, but it would not surprise me if the limitations of this environment resembled those of earlier generations of products, where programs might have needed to share a physical machine graciously. Thus, the heritage of the system and the expectations of its users from earlier times appear to have survived to influence the capabilities of this particular system. Yet, this Unix implementation was actually certified as compliant with the X/Open Portability Guide specifications, initially XPG3, and was apparently the first system to have XPG4 base compliance.
Partitioning by User
A tour of a system that might seem alien or archaic to some might seem self-indulgent, but it raises a few useful thoughts about how systems may be partitioned and the sophistication of such partitioning. For instance, VME seems to have emphasised partitioning by user, and this approach is a familiar and mature one with Unix systems, too. Traditionally, dedicated user accounts have been set up to run collections of associated programs. Web servers often tend to run in a dedicated account, typically named “apache” or “httpd”. Mail servers and database servers also tend to follow such conventions. Even Android has used distinct user accounts to isolate applications from each other.
Of course, when partitioning functionality by user in Unix systems, one must remember that all of the processes involved are isolated from each other, in that they do not share memory inadvertently, and that the user identity involved is merely associated with these processes: it does not provide a container for them in its own right. Indeed, the user abstraction is simply the way that access by these processes to the rest of the system is controlled, largely mediated by the filesystem. Thus, any such partitioning arrangement brings the permissions and access control mechanisms into consideration.
In the simplest cases, such as a Web server needing to be able to read some files, the necessary adjustments to groups or even the introduction of access control lists can be sufficient to confine the Web server to its own territory while allowing other users and programs to interact with it conveniently. For example, Web pages can be published and updated by adding, removing and changing files in the Web site directories given appropriate permissions. However, it is when considering the combination of servers or services, each traditionally operating under their own account, that administrators start to consider alternatives to such traditional approaches.
Let us consider how we might deploy multiple Web applications in a shared hosting environment. Clearly, it would be desirable to give all of these applications distinct user accounts so that they would not be able to interfere with each other’s files. In a traditional shared hosting environment, the Web application software itself might be provided centrally, with all instances of an application relying on the same particular version of the software. But as soon as the requirements for the different instances start to diverge – requiring newer or older versions of various components – they become unable to rely entirely on the centrally provided software, and alternative mechanisms for deploying divergent components need to be introduced.
To a customer of such a service having divergent requirements, the provider will suggest various recipes for installing new software, often involving language-specific packaging or building from source, with compilers available to help out. The packaging system of the underlying software distribution is then mostly being used by the provider itself to keep the operating system and core facilities updated. This then leads people to conclude that distribution packaging is too inflexible, and this conclusion has led people in numerous directions to try and address the apparently unmet needs of the market, as well as to try and pitch their own particular technology as the industry’s latest silver bullet.
There is arguably nothing to stop anyone deploying applications inside a user’s home directory or a subdirectory of the home directory, with /home/user/etc being the place where common configuration files are stored, /home/user/var being used for some kind of coordination, and so on. Many applications can be configured to work in another location. One problem is that this configuration is sometimes fixed within the software when it is built, meaning that generic packages cannot be produced and deployed in arbitrary locations.
Another is that many of the administrative mechanisms in Unix-like systems favour the superuser, rely on operating on software configured for specific, centralised locations, and only really work at the whole-machine level with a global process table, a global set of user identities, and so on. Although some tools support user-level activities, like the traditional cron utility, scheduling jobs on behalf of users, as far as I know, traditional Unix-like systems have never really let users define and run their own services along the same lines as is done for the whole system, administered by the superuser.
Partitioning by Container
If one still wants to use nicely distribution-packaged software on a per-user, per-customer or per-application basis, what tends to happen is that an environment is constructed that resembles the full machine environment, with this kind of environment existing in potentially many instances on the same system. In other words, just so that, say, a Debian package can be installed independently of the host system and any of its other users, an environment is constructed that provides directories like /usr, /var, /etc, and so on, allowing the packaging system to do its work and to provide the illusion of a complete, autonomous machine.
Within what might be called the Unix traditions, a few approaches exist to provide this illusion to a greater or lesser degree. The chroot mechanism, for instance, permits the execution of programs that are generally only able to see a section of the complete filesystem on a machine, located at a “changed root” in the full filesystem. By populating this part of the filesystem with files that would normally be found at the top level or root of the normal filesystem, programs invoked via the chroot mechanism are able to reference these files as if they were in their normal places.
Various limitations in the scope of chroot led to the development of such technologies as jails, Linux-VServer and numerous others, going beyond filesystem support for isolating processes, and providing a more comprehensive illusion of a distinct machine. Here, systems like Plan 9 showed how the Unix tradition might have evolved to support such needs, with Linux and other systems borrowing ideas such as namespaces and applying them in various, sometimes clumsy, ways to support the configuration of program execution environments.
Going further, technologies exist to practically simulate the experience of an entirely separate machine, these often bearing the “virtual machine” label in the vocabulary of our current era. A prime example of such a technology is KVM, available on Linux with the right kind of processor, which allows entire operating systems to run within another. Using a virtual machine solution of this nature is something of a luxury option for an application needing its own environment, being able to have precisely the software configuration of its choosing right down to the level of the kernel. One disadvantage of such full-fat virtual machines is the amount of extra software involved and those layers upon layers of programs and mechanisms, all requiring management and integration.
Some might argue for solutions where the host environment does very little and where everything of substance is done in one kind of virtual machine or other. But if all the virtual machines are being used to run the same general technology, such as flavours of Linux, one has to wonder whether it is worth keeping a distinct hypervisor technology around. That might explain the emergence of KVM as an attempt to have Linux act as a kind of hypervisor platform, but it does not excuse a situation where the hosting of entire systems is done in preference to having a more configurable way of deploying applications within Linux itself.
Some adherents of hypervisor technologies advocate the use of unikernels as a way of deploying lightweight systems on top of hypervisors, specialised to particular applications. Such approaches seem reminiscent of embedded application deployment, with entire systems being built and tuned for precisely one job: useful for some domains but not generally applicable or particularly flexible. And it all feels like the operating system is just being reinvented in a suboptimal, ad-hoc fashion. (Unikernels seem to feature prominently in the “microkernel and component-based OS” developer room at FOSDEM these days.)
Then there is the approach advocated in Liam Proven’s talk, of stripping down an operating system for hypervisor deployment, which would need to offer a degree of extra flexibility to be more viable than a unikernel approach, at least when applied to the same kinds of problems. Of course, this pushes hardware support out of the operating system and into the realm of the hypervisor, which could be beneficial if done well, or it could imperil support for numerous hardware platforms and devices due to numerous technological, economic and social reasons. Liam advocates pushing filesystem support out of the kernel, and potentially out of the operating system as well, although it is not clear what would then need to take up that burden and actually offer filesystem facilities.
Some Reflections
This is where we may return to those complaints about the complexity of modern hosting frameworks. That a need for total flexibility in every application’s software stack presents significant administrative challenges. But in considering the nature of the virtual machine in its historical forms, we might re-evaluate what kind of environment software really needs.
In my university studies, a project of mine investigated a relatively hot topic at the time: mobile software agents. One conclusion I drew from the effort was that programs could be written to use a set of well-defined interfaces and to potentially cooperate with other programs, without thousands of operating system files littering their shared environment. Naturally, such programs would not be running by magic: they would need to be supported by infrastructure that allows them to be loaded and executed, but all of this infrastructure can be maintained outside the environment seen by these programs.
At the time, I relied upon the Python language runtime for my agent programs with its promising but eventually inadequate support for safe execution to prevent programs from seeing the external machine environment. Most agent frameworks during this era were based on particular language technologies, and the emergence of Java only intensified the industry’s focus on this kind of approach, naturally emphasising Java, although Inferno also arrived at around this time and offered a promising, somewhat broader foundation for such work than the Java Virtual Machine.
In the third part of his article series, Liam Proven notes that Plan 9, Inferno’s predecessor, is able to provide a system where “every process is in a container” by providing support for customisable process namespaces. Certainly, one can argue that Plan 9 and Inferno have been rather overlooked in recent years, particularly by the industry mainstream. He goes on to claim that such functionality, potentially desirable in application hosting environments, “makes the defining features of microkernels somewhat irrelevant”. Here I cannot really agree: what microkernels actually facilitate goes beyond what a particular operating system can do and how it has been designed.
A microkernel-based approach not only affords the opportunity to define the mechanisms of any resulting system, but it also provides the ability to define multiple sets of mechanisms, all of them potentially available at once, allowing them to be investigated, compared, and even combined. For example, Linux retains the notion of a user of the system, maintaining a global registry of such users, and even with notionally distinct sets of users provided by user namespaces, cumbersome mappings are involved to relate those namespace users back to this global registry. In a truly configurable system, there can be multiple user authorities, each being accessible by an arbitrary selection of components, and some components can be left entirely unaware of the notion of a user whatsoever.
Back in the 1990s, much coverage was given to the notion of operating system personalities. That various products would, for example, support DOS or Windows applications as well as Macintosh ones or Unix ones or OS/2 ones. Whether the user interface would reflect this kind of personality on a global level or not probably kept some usability professionals busy, and I recall one of my university classmates talking about a system where it was apparently possible to switch between Windows or maybe OS/2 and Macintosh desktops with a key combination. Since his father was working at IBM, if I remember correctly, that could have been an incarnation of IBM’s Workplace OS.
Other efforts were made to support multiple personalities in the same system, potentially in a more flexible way than having multiple separate sessions, and certainly more flexible than just bundling up, virtualising or emulating the corresponding environments. Digital investigated the porting of VMS functionality to an environment based on the Mach 3.0 microkernel and associated BSD Unix facilities. Had Digital eventually adopted a form of OSF/1 based on Mach 3.0, it could have conceivably provided a single system running Unix and VMS software alongside each other, sharing various common facilities.
Regardless of one’s feelings about Mach 3.0, whether one’s view of microkernels is formed from impressions of an infamous newsgroup argument from over thirty years ago, or whether it considers some of the developments in the years since, combining disparate technologies in a coherent fashion within the same system must surely be a desirable prospect. Being able to do so without piling up entire systems on top of each other and drilling holes between the layers seems like a particularly desirable thing to do.
A flexible, configurable environment should appeal to those in the same position as the FOSDEM presenter wishing to solve his hosting problems with pruned-down software stacks, as well as appealing to anyone with their own unrealised ambitions for things like mobile software agents. Naturally, such a configurable environment would come with its own administrative overheads, like the need to build and package applications for deployment in more minimal environments, and the need to keep software updated once deployed. Some of that kind of work should arguably get done under the auspices of existing distribution frameworks and initiatives, as opposed to having random bundles of software pushed to various container “hubs” posing as semi-official images, all the while weighing down the Internet with gigabytes of data constantly scurrying hither and thither.
This article does not propose any specific solution or roadmap for any of this beyond saying that something should indeed be done, and that microkernel-based environments, instead of seeking to reproduce Unix or Windows all over again, might usefully be able to provide remedies that we might consider. And with that, I suppose I should get back to my own experiments in this area.
Sunday, 21 July 2024
KDE Gear 24.08 branches created
Make sure you commit anything you want to end up in the KDE Gear 24.08
releases to them
Next Dates
- July 25, 2024: 24.08 Freeze and Beta (24.07.80) tag & release
- August 8, 2024: 24.08 RC (24.07.90) Tagging and Release
- August 15, 2024: 24.08 Tagging
- August 22, 2024: 24.08 Release
https://community.kde.org/Schedules/KDE_Gear_24.08_Schedule
Saturday, 22 June 2024
AWS AppConfig agent error “connection refused”
AWS AppConfig service it’s useful for feature flag functionality, you can access it directly via API but this is not the suggested method, for production workload it’s a best practice to use the provided agent. If you are using AppConfig on Kubernetes or EKS you should add the appconfig-agent to your deployment by adding:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-namespace
labels:
app: my-application-label
spec:
replicas: 1
selector:
matchLabels:
app: my-application-label
template:
metadata:
labels:
app: my-application-label
spec:
containers:
- name: my-app
image: my-repo/my-image
imagePullPolicy: IfNotPresent
- name: appconfig-agent
image: public.ecr.aws/aws-appconfig/aws-appconfig-agent:2.x
ports:
- name: http
containerPort: 2772
protocol: TCP
env:
- name: SERVICE_REGION
value: region
imagePullPolicy: IfNotPresent
This method will work but in some edge cases you could “randomly” get an exception like this:
cURL error 7: Failed to connect to localhost port 2772 after 0 ms: Connection refused (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://localhost:2772/applications/APPLICATION_NAME/environments/ENVIRONMENT_NAME/configurations/CONFIGURATION_NAME
If you take a look at the logs you could notice that the AppConfig agent has been explicitly shut down:
[appconfig agent] INFO shutdown complete (actual duration: 50ms)
[appconfig agent] INFO received terminated signal, shutting down
[appconfig agent] INFO shutting down in 50ms
[appconfig agent] INFO stopping server on localhost:2772
digging into the logs you could notice that the master container is still working for some seconds after the appconfig-agent has been shut down, that’s the problem! appconfig-agent is very fast to shut down, if your primary container is still working when appconfig has been shut down, your primary container will not be able to connect to the agent and you will get the error.
How to make sure that appconfig-agent is always active in a deployment? the new Sidecar Container feature, added in the recent 1.29 Kubernetes release, is a perfect fit: the container in the sidecar (appconfig-agent) will be the first to start and the last to stop, your primary container will always find the sidecar ready.
Modify the deployment this way:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: my-namespace
labels:
app: my-application-label
spec:
replicas: 1
selector:
matchLabels:
app: my-application-label
template:
metadata:
labels:
app: my-application-label
spec:
containers:
- name: my-app
image: my-repo/my-image
imagePullPolicy: IfNotPresent
initContainers:
- name: appconfig-agent
image: public.ecr.aws/aws-appconfig/aws-appconfig-agent:2.x
restartPolicy: Always
ports:
- name: http
containerPort: 2772
protocol: TCP
env:
- name: SERVICE_REGION
value: region
imagePullPolicy: IfNotPresent
Friday, 14 June 2024
KDE Gear 24.08 release schedule
This is the release schedule the release team agreed on
https://community.kde.org/Schedules/KDE_Gear_24.08_Schedule
Dependency freeze is in around 4 weeks (July 18) and feature freeze one
after that. Get your stuff ready!
Monday, 10 June 2024
Help wanted! Port KDE Frameworks oss-fuzz builds to Qt6/KF6
If you're looking for an isolated and straightforward way to start contributing to KDE, you're in the right place. At KDE, we use fuzzing via oss-fuzz to try to ensure our libraries are robust against broken inputs. Here's how you can help us in this essential task.
What is Fuzzing?
Fuzzing involves feeding "random" [1] data into our code to check its robustness against invalid or unexpected inputs. This is crucial for ensuring the security and stability of applications that process data without direct user control.
Why is Fuzzing Important?
Imagine receiving an image via email, saving it to your disk, and opening it in Dolphin. This will make Dolphin create a thumbnail of the image. If the image is corrupted and our image plugin code isn't robust, the best-case scenario is that Dolphin crashes. In the worst case, it could lead to a security breach. Hence, fuzzing helps prevent such vulnerabilities.
How You Can Help:
We need to update the build of KDE libraries in oss-fuzz to use Qt6. This task could be challenging because it involves static compilation and ensuring the correct flags are passed for all compilation units.
Steps to Contribute:
Start with karchive Project
- Download oss-fuzz and go into the karchive subfolder.
- Update the Dockerfile to download Qt from the dev branch and KDE Frameworks from the master branch.
Update build.sh Script:
- Modify the
build.sh
script to compile Qt6 (this will be harder since it involves moving from qmake to cmake) and KDE Frameworks 6.
- Modify the
Check karchive_fuzzer.cc:
- This file might need updates, but they should be relatively easy.
- At the top of
karchive_fuzzer.cc
, you'll find a comment with the three commands that oss-fuzz runs. Use these to test the image building, fuzzer building, and running processes.
Need Help?
If you have questions or need assistance, please contact me at aacid@kde.org or ping me on Matrix at @tsdgeos:kde.org
Note:
[1] Smart fuzzing engines don't generate purely random data. They use semi-random and semi-smart techniques to efficiently find issues in the code.
Monday, 03 June 2024
Reconsidering Classic Programming Interfaces
Since my last update, I have been able to spend some time gradually broadening and hopefully improving the support for classic programming interfaces in my L4Re-based experiments, centred around a standard C library implementation based on Newlib. Of course, there were some frustrations experienced along the way, and much remains to be done, not only in terms of new functionality that will need to be added, but also verification and correction of the existing functionality as I come to realise that I have made mistakes, these inevitably leading to new frustrations.
One area I previously identified for broadened support was that of process creation and the ability to allow programs to start other programs. This necessitated a review of the standard C process control functions, which are deliberately abstracted from the operating system and are much simpler and more constrained than those found in the unistd.h file that Unix programmers might be more familiar with. The traditional Unix functions are very much tied up with the Unix process model, and there are some arguments to be made that despite being “standard”, these functions are a distraction and, in various respects, undesirable from a software architecture perspective, for both applications and the operating systems that run them.
So, ignoring the idea that I might support the likes of execl, execv, fork, and so on, I returned to consideration of the much more limited system function that is part of the C language standards, this simply running an abstract command provided by a character string and returning a result code when the command has completed:
int system(const char *command);
To any casual application programmer, this all sounds completely reasonable: they embed a command in their code that is then presented to “the system”, which runs the commands and hands back a result or status code. But those of us who are accustomed to running commands at the shell and in our own programs might already be picking apart this simple situation.
First of all, “the system” needs to have what the C standards documentation calls a “command processor”. In fact, even Unix standardisation efforts have adopted the term, despite the Linux manual pages referring to “the shell”. But at this point, my own system does not have a shell or a command processor, instead providing only a process server that manages the creation of new processes. And my process server deals with arrays or “vectors” of strings that comprise a command to be used to run a given program, configured by a sequence of arguments or parameters.
Indeed, this brings us to some other matters that may be familiar to anyone who has had the need to run commands from within programs: that of parameterising command invocations by introducing our own command argument values; and that of making sure that the representation of the program name and its arguments do not cause the shell to misinterpret these elements, by letting an errant space character break the program name into two, for instance. When dealing only with command strings, matters of quoting and tokenisation enter the picture, making the exercise very messy indeed.
So, our common experience has provided us with a very good reason to follow the lead of the classic execv Unix function and to avoid the representational issues associated with command string processing. In this regard, the Python standard library has managed to show the way in some respects, introducing the subprocess module which features interfaces that are equivalent to functions like system and popen, supporting the use of both command strings and lists of command elements to represent the invoked command.
Oddly, however, nobody seems to provide a “vector” version of the system function at the C language level, but it seemed to be the most natural interface I might provide in my own system:
int systemv(int argc, const char *argv[]);
I imagine that those doing low-level process creation in a Unix-style environment would be content to use the exec family of functions, probably in conjunction with the fork function, precisely because a function like execv “shall replace the current process image with a new process image”, as the documentation states. Obviously, replacing the current process isn’t helpful when implementing the system function because it effectively terminates the calling program, whereas the system function is meant to allow the program to continue after command completion. So, fork has to get involved somehow.
The Flow of Convention
I get the impression that people venturing along a similar path to mine are often led down the trail of compatibility with the systems that have gone before, usually tempted by the idea that existing applications will eventually be content to run on their system without significant modification, and thus an implementer will be able to appeal to an established audience. In this case, the temptation is there to support the fork function, the exec semantics, and to go with the flow of convention. And sometimes, a technical obstacle seems like a challenge to be met, to show that an implementation can provide support for existing software if it needs or wants to.
Then again, having seen situations where software is weighed down by the extra complexity of features that people believe it should have, some temptations are best resisted, perhaps with a robust justification for leaving out any particular supposedly desirable feature. One of my valued correspondents pointed me to a paper by some researchers that provides a robust argument for excluding fork and for promoting alternatives. Those alternatives have their shortcomings, as noted in the paper, and they seem rather complicated when considering simple situations like merely creating a completely separate process and running a new program in it.
Of course, there may still be complexity in doing simple things. One troublesome area was that of what might happen to the input and output streams of a process that creates another one: should the new process be able to receive the input that has been sent to the creating process, and should it be able to send its output to the recipient of the creating process’s output? For something like system or systemv, the initial “obvious” answer might be the total isolation of the created process from any existing input, but this limits the usefulness of such functions. It should arguably be possible to invoke system or systemv within a program that is accepting input as part of a pipeline, and for a process created by these functions to assume the input processing role transparently.
Indeed, the Unix world’s standards documentation for system extends the C standard to assert that the system function should behave like a combination of fork and execl, invoking the shell utility, sh, to initiate the program indicated in the call to system. It all sounds a bit prescriptive, but I suppose that what it largely means is that the input and output streams should be passed to the initiated program. A less prescriptive standard might have said that, of course, but who knows what kind of vendor lobbying went on to avoid having to modify the behaviour of those vendors’ existing products?
This leads to the awkward problem of dealing with the state of an input stream when such a stream is passed to another process. If the creating process has already read part of a stream, we need the newly created process to be aware of the extent of consumed data so that it may only read unconsumed data itself. Similarly, the newly created process must be able to append output to the existing output stream instead of overwriting any data that has already been written. And when the created process terminates, we need the creating process to synchronise its own view of the input and output streams. Such exercises are troublesome but necessary to provide predictable behaviour at higher levels in the system.
Some Room for Improvement
Another function that deserves revisiting is the popen function which either employs a dedicated output stream to capture the output of a created process within a program, or a dedicated input stream so that a program can feed the process with data it has prepared. The mode indicates what kind of stream the function will provide: “r” yields an output stream passing data out of the process, “w” yields an input stream passing data into the process.
FILE *popen(const char *command, const char *mode);
This function is not in the C language standards but in Unix-related standards, but it is too useful to ignore. Like the system function, the standards documentation also defines this function in terms of fork and execl, with the shell getting involved again. Not entirely obvious from this documentation is what happens with the stream that isn’t specified, however, but we can conclude that with its talk of input and output filters, as well as the mention of those other functions, that if we request an output stream from the new process, the new process will acquire standard input from the creating process as its own input stream. Correspondingly, if we request an input stream to feed the new process, the new process will acquire standard output for itself and write output to that.
This poses some concurrency issues that the system function largely avoids. Since the system function blocks until the created process is completed, the state of the shared input and output streams can be controlled. But with popen, the created process runs concurrently and can interact with whichever stream it acquired from the creating process, just as the creating process might also be using it, at least until pclose is invoked to wait for the completion of the created process. The standards documentation and the Linux manual page both note such pitfalls, but the whole business seems less than satisfactory.
Again, the Python standard library shows what a better approach might be. Alongside the popen function, the popen2 function creates dedicated input and output pipes for interaction with the created process, the popen3 function adds an error pipe to the repertoire, and there is even a popen4 function that presumably does what some people might expect from popen2, merging the output and error streams into a single stream. Naturally, this was becoming a bit incoherent, and so the subprocess module was brought in to clean it all up.
Our own attempt at a cleaner approach might involve the following function:
pid_t popenv(int argc, const char *argv[], FILE **input, FILE **output, FILE **error);
Here, we want to invoke a program using a vector containing the program and arguments, just as we did before, but we also want to acquire the input, output and error streams. However, we might allow any of these to be specified as NULL, indicating that any such stream will not be opened for the created process. Since this might cause problems, we might need to create special “empty” or “null” streams, where appropriate, so as not to upset the C library.
Unlike popen, we might also provide the process identifier for the created process. This would allow us to monitor the process, control it in some way, and to wait for its completion. The nature of a process identifier is potentially more complicated than one might think, especially in my own system where there can be many process servers, each of them creating new processes without any regard to the others.
A Simpler Portable Environment Standard
Maybe I am just insufficiently aware of the historical precedents in this regard, but it seems that while C language standards are disappointingly tame when it comes to defining interaction with the host environment, the Unix or POSIX standardisation efforts go into too much detail and risk burdening any newly designed system with the baggage of systems that happened to be commercially significant at a particular point in time. Windows NT infamously feigned compliance with such standards to unlock the door to lucrative government contracts and to subvert public software procurement processes, generating colossal revenues that easily paid for any inconvenient compliance efforts. However, for everybody else, such standards seem to encumber system and application developers with obligations and practices that could be refined, improved and made more suitable for modern needs.
My own work depends on L4Re which makes extensive use of capabilities to provide access to entities within the system. For example, each process relies on a task that provides a private address space, within which code and data reside, along with an object space that retains the capabilities available within the task. Although the Fiasco (or L4Re) microkernel has some notion of all the tasks in the system, as well as all the threads, together with other kinds of objects, such global information is effectively private to the kernel, and “user space” programs merely deal with capabilities that reference specific objects. For such programs, there is no way to get some kind of universal list of tasks or threads, or to arbitrarily request control over any particular instances of them.
In systems with different characteristics to the ones we already know, we have to ask ourselves whether we want to reproduce legacy behaviour. To an extent, it might be desirable to have registers of resident processes and the ability to list the ones currently running in the system, introducing dedicated components to retain this information. Indeed, my process servers could quite easily enumerate and remember the details of processes they create, also providing an interface to query this register, maybe even an interface to control and terminate processes.
However, one must ask whether this is essential functionality or not. For now, the rudimentary shell-like environment I employ to test this work provides similar functionality to the job control features of the average Unix shell, remembering the processes created in this environment and offering control in a limited way over this particular section of the broader system.
And so the effort continues to try and build something a little different from, and perhaps a bit more flexible than, what we use today. Hopefully it is something that ends up being useful, too.
Sunday, 02 June 2024
cd
’s long lost sibling finally here!
cd
is a straightforward command. As per the name, it changes the directory and does its job perfectly well. But what if it could do more? One scenario is wanting to execute a command inside a specific location without affecting the current working directory (CWD). This article introduces a cd
replacement which offers that feature as well as provides more ways to specify the target directory.
It is important to note that it’s not intended for scripting. Rather, it’s only meant for interactive use where it streamlines some operations.
New Features
For impatient readers, the code is available on GitHub†. Otherwise, let’s first go through the new features of this enhanced cd
.
It takes a command as an optional argument. The command is launched inside of the target directory without changing CWD, for example:
~/code/rust-rocksdb/librocksdb-sys$ cd .. cargo build # ... builds rust-rocksdb rather than librocksdb-sys ~/code/rust-rocksdb/librocksdb-sys$
The target directory can be specified as a file. The code will change to directory containing that file. This is convenient when copying and pasting paths. A file location can be passed without having to strip the last path component, for example (border around text symbolises copying and pasting):
~/code/linux$ git whatchanged -n1 |grep ^: :100644 100644 8ddb2219a84b 6b384065c013 M include/uapi/linux/kd.h ~/code/linux$ cd include/uapi/linux/kd.h ~/code/linux/include/uapi/linux$
The target directory can be specified using a path starting with
.../
. The code navigates up the directory tree until a matching path is found, for example:~/code/linux/drivers/usb/gadget/udc$ cd .../Documentation ~/code/linux/Documentation$
The enhancement integrates with Bash’s
autocd
option. With it enabled, invoking a directory followed by a command executes that command inside of said directory, for example:/tmp/bash-5.2$ ./examples pwd cd -- ./examples/ pwd /tmp/bash-5.2/examples /tmp/bash-5.2$
cd -P
resolves all symlinks inPWD
. I’ve found this is more useful than POSIX-mandated behaviour. For consistency, ofcd -L
also doesn’t switch to home directory.
Installation
The new cd
comes as a shell script which needs to be sourced in ~/.shellrc
, ~/.bashrc
or equivalent file.
I further recommend adding an alias for -
command. This may look strange, but creating a hyphen alias is perfectly fine even though it requires some care. autocd
in Bash is also worth a try.
The enhanced cd
together with those optional configuration options can be installed by executing the following commands:
mkdir -p ~/.local/opt cd ~/.local/opt # Replace with ‘master’ to get the latest version though # be warned that there are no guarantees of compatibility # between the versions. commit=8ca6070ce2e58581b1aeec748513bbd33904b41d wget "https://raw.githubusercontent.com/mina86/dot-files/${commit?}/bin/pcd.sh" . pcd.sh install=' if [ -e ~/.local/opt/pcd.sh ]; then . ~/.local/opt/pcd.sh fi # Bash interprets ‘-=…’ as a flag so ‘--’ is needed but # BusyBox complains about it so silence the warning. alias -- -="cd -" 2>/dev/null ' # Add to Bash echo "${install?}" >>~/.bashrc echo "shopt -qs autocd" >>~/.bashrc # Add to other shells echo "${install?}" >>~/.shellrc
Limitations
Firstly, the enhanced command does not support any other switches shell’s cd
might offer such as -e
or -@
. Anyone who relies on them should be able to add them to the script with relative ease.
Secondly, the command doesn’t fully integrate with CDPATH
. While basic functionality of CDPATH
works, it cannot be combined with alternative target directory specification used by the new cd
.
Conclusion
There are commands a seasoned shell user may use without giving them a second thought. Certainly, cd
is so obvious and straightforward that there’s nothing to change about it. However, accepting that even fundamental commands could be changed may lead to improvements in one’s workflow.
I’ve been using various forms of enhanced cd
for over a decade. And with this post I hope I’ve inspired you, Dear Reader, to give it a shot as well. The exact set of features may not be to your liking, but nothing stops you from writing your own cd
replacement.
† Note that the repository includes my dot-files and I may with time update functionality of the pcd.sh
script to the point where description in this article is no longer accurate. This post is describing version at commit 8ca6070c. Setup instructions in Installation section are pinned to that version.
Friday, 31 May 2024
Xonsh + vterm in Emacs
I’ve been using Xonsh, a shell that combines a shell REPL with a Python REPL, for years now. I’ve also been using Emacs for years, but I was never able to marry the two in a satisfactory way. But finally, after being frustrated for long enough, I solved the puzzle. This article is written to help like one or two other people on this world who use both Emacs and Xonsh.
vterm, probably the best terminal
emulator in Emacs, requires some shell-side configuration to make a shell
integrate cleanly into Emacs. Specifically, an improved clear
experience and
directory- and prompt-tracking. vterm can also do message passing, but I’m not
very interested in running Elisp in my terminal emulator—I have the rest of
Emacs for that.
The idea is to print some invisible/hidden strings to the terminal that vterm
can subsequently read, but that the user is unbothered by. The code to achieve
this in .xonshrc
is:
# You can modify this however you want.
$PROMPT = "{env_name}� {BOLD_GREEN}{user}{RESET} {BOLD_BLUE}{cwd_base}{RESET}{branch_color}{curr_branch: {}}{RESET} {BOLD_BLUE}{prompt_end}{RESET} "
def _vterm_printf(text):
def _term_is(value):
return $TERM.split("-")[0] == value
if ${...}.get("TMUX") and (_term_is("tmux") or _term_is("screen")):
return $(printf r"\ePtmux;\e\e]%s\007\e\\" @(text))
elif _term_is("screen"):
return $(printf r"\eP\e]%s\007\e\\" @(text))
else:
return $(printf r"\e]%s\e\\" @(text))
def _vterm_prompt_end():
return _vterm_printf("51;A{user}@{hostname}:{cwd}")
if ${...}.get("INSIDE_EMACS"):
$SHELL_TYPE = "readline"
def _clear(args, stdin=None):
print(_vterm_printf("51;Evterm-clear-scrollback"), end="")
tput clear @(args)
aliases["clear"] = _clear
$PROMPT += _vterm_prompt_end()
One important thing to note is that this only works in readline
mode.
prompt-toolkit is
much fancier, but for reasons that are unknown to me, modifying $PROMPT
as
above does not produce the desired result. I’ve also considered monkey-patching
print_color
as a work-around, but there exists no xonsh.built_ins.XSH.shell
inside of .xonshrc
to monkey-patch.
After implementing the above code in .xonshrc
, you can do the following things
in vterm:
C-c C-p
andC-c C-n
(vterm-[previous,next]-prompt
) move back and forth between prompts.C-x C-f
(find-file
) starts in the CWD of the shell.- When clearing, old data is removed from the buffer.
And that’s it. I’ll see about upstreaming some of this knowledge to vterm some day soon after some more hacking/testing.
Wednesday, 29 May 2024
REUSE alpha release: v3.1.0a1
Yesterday I released v3.1.0a1 of the REUSE tool. It is an alpha release for the soon-to-be-released REUSE Specification v3.2, which can be found in its current state at this link.
The biggest change is the introduction of REUSE.toml
, a configuration file that
replaces the soft-deprecated .reuse/dep5
. This configuration file allows you to
declare the copyright and licensing of files (and globs of files) relative to
the file. The important distinctions from .reuse/dep5
are:
- you can place the
REUSE.toml
file anywhere in your project; - you can declare the precedence of information in case
REUSE.toml
disagrees with the contents of the file; - and, because
REUSE.toml
is just a TOML file, you can add any other metadata that you want.
Because this is an alpha release, the accompanying documentation is not yet easily discoverable, but it is (in the process of being) written. Below some links:
- The new REUSE Specification v3.2 https://reuse.software/spec-3.2/
- An updated FAQ (under construction) https://github.com/carmenbianca/reuse-website/blob/3.2-improvements/site/content/en/faq.md
- The tool documentation https://reuse.readthedocs.io/en/v3.1.0a1/
- New man pages https://reuse.readthedocs.io/en/v3.1.0a1/man/index.html
- The change log https://reuse.readthedocs.io/en/v3.1.0a1/history.html
- The alpha release on PyPI https://pypi.org/project/reuse/3.1.0a1/
The purpose of the alpha release is to collect feedback on the newly implemented
(and defined) REUSE.toml
. If you have some spare time to take a look at this,
you can convert your .reuse/dep5
file to REUSE.toml
using reuse convert-dep5
, and you can e-mail me at carmenbianca@fsfe.org, write to
reuse@lists.fsfe.org, or create issues against the
reuse-tool or
reuse-website repositories. (Some day
soon I’ll finally be able to move those repositories away from GitHub,
inshallah.)
In the near future, after this is properly released, I want to look at creating
a lint-file
command for linting individual files instead of the entire
repository (for better pre-commit integration), and I want to see if I can
create a pre-commit hook that automagically adds REUSE information to touched
files.
New blog theme
I recently changed up my blog’s theme. I previously used beautifulhugo, and now I use hugo-pure. The whole thing’s a touch more basic, but I’ve not lost any important features. Multi-language supports works (although it has been ages since I posted in Esperanto), and posts display just fine.
The most important thing I changed from the hugo-pure theme is the text colour:
my black text is #000
instead of some dark grey. I really dislike dark greys
as text.
The rationale for the change is a decreased footprint. It’s a bit senseless to transfer an entire megabyte of data just to read some text. As an added benefit, this new theme has no JavaScript whatsoever. The bundled (minified) CSS is still a bit on the bulky side, but I’m not enough of a designer to dare tackle that problem.
Anyway, good stuff, new blog theme. I wish more of the web was just text.
Update (2024-05-30): I’ve changed the text colour to #111
after doing some
research. It’s dark enough to satisfy my dislike for grey texts, and bright
enough to satisfy all the UX people on the internet who say never to use black
text. The original #434343
was a touch silly, though.
Sunday, 19 May 2024
Demystifying the jargon: free software vs open source
Some people struggle to understand the distinctions between ‘free software’ and ‘open source software.’ Let’s clear up the confusion with an analogy.
Imagine a world without vegetarianism. One day, someone proposes a new diet called ‘moral eating,’ which excludes meat for ethical reasons. Some people embrace it, and discover additional benefits like reduced environmental impact. However, advocates observe that implying people not adhering to the diet are immoral isn’t the best recruitment strategy. They coin the term ‘sustainable eating’ to focus on the environmental advantages.
But now people get bogged down in philosophical debates. If one uses the term ‘moral eating’ some assume they don’t care about the environment; on the other hand, if one says ‘sustainable eating’ some assume they don’t care about animals. To avoid this an all-encompassing acronym MSE (Moral and Sustainable Eating) is created. It signifies the same thing — no meat — but avoids getting entangled in justifications.
And so we end up with three distinct terms — moral eating, sustainable eating and MSE — which all refer to the same diat. What we call vegetarianism.
This is how the terms free software, open source and FOSS (Free and Open Source Software) came to be. They all represent the same category of software with a different advocacy philosophy. Free software emphasises the four essential freedoms and open source uses the Open Source Definition. While the latter might be more explicit on some points — it overtly prohibits discrimination against any people or field of endeavour — the four freedoms implicitly cover them as well.
Source-available software
Here’s where things get tricky. Some companies try to capitalize on the positive associations of open source without truly adhering to its principles. They might ‘open their software’ but release source code under a license that restricts creating derivative works. This could be due to genuine misunderstanding or intentional manipulation. Whatever the reason, if the four essential freedoms aren’t granted, the code isn’t open source. This type of software is more accurately called source-available software.
Libre Software
Another point of confusion is the ambiguity of the term ‘free software.’ ‘Free’ can refer to price or freedom. The common saying ‘free as in freedom, not as in beer’ attempts to clarify this imprecision. To eliminate the ambiguity altogether, the terms libre software or libreware have emerged. And to include it in the FOSS acronym it’s sometimes replaced with FLOSS (Free, Libre and Open Source Software).
Proprietary software that one can acquire without paying is called freeware. It’s distinct from free software, which is only concerned with user freedoms and permits selling of the software.
Creative Commons and Free Software
Lastly, it’s worth mentioning the Creative Commons organisation. It aims to simplify copyright by allowing creators to share their work with specific permissions. While its goals align somewhat with free software, it’s important to note that not all Creative Commons licenses qualify. Any license that disallows derivative works (NoDerivatives) or commercial use (NonCommercial) doesn’t meet the criteria for free software.
There are three Creative Commons licenses which are considered free software:
- CC0, which is roughly equivalent to something being in Public Domain,
- CC BY (Attribution), which is roughly equivalent to permissive free software licenses and
- CC BY-SA (Attribution-ShareAlike), which is roughly equivalent to copyleft free software licenses.
However, when licensing source code, it’s generally recommended to use licenses specifically designed for software, such as various GPL variants, the Mozilla Public License, the Apache license, or the MIT license.
Conclusion
Free software, open source software, libre software, libreware, FOSS and FLOSS all describe the same category of software: software with source code that users can freely run, modify, and redistribute. Source-available software has accessible code whose license prevents one of those activities.
Monday, 13 May 2024
KDE Goals April 2024 sprint
A few weeks ago I attended the KDE Goals April 2024 sprint
I was there as part of the Automation & Systematization sprint given my involvement in the release process, the "not very automatized" weekly emails about the status of CI about KDE Gear and KDE Frameworks, etc. but I think that maybe I was there more as "person that has been around a long time, ask me if you have questions about things that are documented through oral tradition"
I didn't end up doing lots of work on sprint topics themselves (though I participated in various discussions, did a bit of pair-programming with Aleix on QML accessibility issues, inspired DavidR to do the QML-text-missing-i18n check that he describes in his blog); instead I cheated a bit and used the sprint to focus on some of the KDE stuff I had a bit on my backlog, creating the KDE Gear release/24.05 branches and lots of MR reviewing and more!
Thanks KDE e.V. for sponsoring the trip, if you would like such events to continue please we need your continued donations
And remember Akademy talk submission period ends in 10 days, send your talk now!
Sunday, 12 May 2024
You’re implementing fmt::Display
wrong
TL;DR: When implementing Display
trait for a wrapper type, use self.0.fmt(fmtr)
rather than invoking write!
macro. See The proper way section below.
Imagine a TimeOfDay
type which represents time as shown on a 24-hour clock. It could look something like the following:
pub struct TimeOfDay { pub hour: u8, pub minute: u8, } impl core::fmt::Display for TimeOfDay { fn fmt(&self, fmtr: &mut core::fmt::Formatter) -> core::fmt::Result { write!(fmtr, "{:02}:{:02}", self.hour, self.minute) } } fn main() { let hour = 2; let minute = 5; assert_eq!("02:05", TimeOfDay { hour, minute }.to_string()); }
White it’s a serviceable solution, one might tremble at the lack of type safety. Nothing prevents the creation of nonsensical times such as ‘42:69’. In real life hour rarely goes past 23 and minute sticks to values below 60. Possible approach to prevent invalid time is to use a newtype idiom with structs imposing limits on the wrapped value, for example:
use core::fmt; struct TimeOfDay { hour: Hour, minute: Minute, } struct Hour(u8); struct Minute(u8); impl Hour { fn new(val: u8) -> Option<Self> { (val < 24).then_some(Self(val)) } } impl Minute { fn new(val: u8) -> Option<Self> { (val < 60).then_some(Self(val)) } } impl fmt::Display for TimeOfDay { fn fmt(&self, fmtr: &mut fmt::Formatter) -> fmt::Result { write!(fmtr, "{:02}:{:02}", self.hour, self.minute) } } fn main() { let hour = Hour::new(2).unwrap(); let minute = Minute::new(5).unwrap(); assert_eq!("02:05", TimeOfDay { hour, minute }.to_string()); }
Alas, since the new types don’t implement Display
trait, the code won’t compile. Fortunately the trait isn’t complicated and one might quickly whip out the following definitions:
impl fmt::Display for Hour { fn fmt(&self, fmtr: &mut fmt::Formatter) -> fmt::Result { write!(fmtr, "{}", self.0) } } impl fmt::Display for Minute { fn fmt(&self, fmtr: &mut fmt::Formatter) -> fmt::Result { write!(fmtr, "{}", self.0) } }
Having Display
, Debug
, Octal
etc. implementations which call write!
macro only is quite common. But while common, it’s at times incorrect. While the above example will build with such definitions, the test in main
will fail (playground) producing the following error:
thread 'main' panicked at src/main.rs:40:5: assertion `left == right` failed left: "02:05" right: "2:5"
The issue is that invoking write!
erases any formatting flags passed through the fmtr
argument. Even though TimeOfDay::fmt
used {:02}
format, the Display
implementations disregard the width and padding options by calling write!
with {}
format.
Fortunately, the solution is trivial and in fact even simpler than calling write!
.
The proper way
In majority of cases, the proper way to implement traits such as Display
or Debug
is to use delegation as follows:
impl fmt::Display for Hour { fn fmt(&self, fmtr: &mut fmt::Formatter) -> fmt::Result { self.0.fmt(fmtr) } } impl fmt::Display for Minute { fn fmt(&self, fmtr: &mut fmt::Formatter) -> fmt::Result { self.0.fmt(fmtr) } }
Since the same Formatter
is used, any configuration that the caller specified (such as width and fill) will be applied when formatting the inner type (playground).
In fact, there is a crate for that. derive_more
offers derives for various traits including Display
. When used with no additional options on a newtype struct, the crate will generate a delegating implementation of the trait. In other words, the above impls can be replaced by the following derive annotations:
#[derive(derive_more::Display)] struct Hour(u8); #[derive(derive_more::Display)] struct Minute(u8);
Display
vs Debug
Related trick is delegating between Display
and Debug
traits (or any other formatting traits). This is especially useful when implementation for both types is identical. A naïve approach would be to use something like write!(fmtr, "{self:?}")
in Display
but this suffers from aforementioned issues. Delegation is once again a better approach (playground):
use core::fmt; #[derive(Debug)] enum DayOfWeek { Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday, } impl fmt::Display for DayOfWeek { fn fmt(&self, fmtr: &mut fmt::Formatter) -> fmt::Result { fmt::Debug::fmt(self, fmtr) } } fn main() { let dow = DayOfWeek::Monday; println!("dbg={dow:?} disp={dow}"); }
Friday, 10 May 2024
Troubleshooting a Set Top Box
Back in March I was in the UK troubleshooting a Humax set-top box (STB) that was behaving erratically. Most of the time it would work as expected, but sometimes it would just display a green screen when switching on or changing channels. I approached this in a number of ways: searching online for other people's experiences with these boxes, trying to find any software updates that might have been needed, and looking into alternatives if these approaches should fail.
One option that was left open was buying a new digital video recorder (DVR), though I was a bit reluctant to rush into this given that there seem to be very few reasonably priced ones available these days. The market seems to have moved to smart TVs and streaming services.
In the end, the solution to the original problem was to change from using a HDMI cable to connect the set-up box and television to using a SCART cable instead. The problem seemed to have been related to HDCP content protection.
The unused approach
One of the fallback options I looked at was buying a Raspberry Pi and a TV hat, and I figured that I might as well just do this to see if it was a viable replacement for a DVR. It wasn't, though it could be made to work with a fair amount more effort and a more powerful Pi than the Zero 2 W that I chose for the experiment.
Although various online stores have guides to help with setting up the hardware and software, it was quite frustrating to get the software configured to download program guides and receive broadcasts. There was a window of time when it worked, but it seemed quite unreliable otherwise.
Repurposing the Pi
While it's possible that the Pi I bought might be needed in its original role, I think it's more likely I'd try to buy a replacement for the original Humax box instead. In the meantime, I started looking into porting Inferno to it. Rather, that should be re-porting Inferno, because the original port was for ARMv6-based hardware, and the Pi Zero 2W is actually ARMv8-based.
Slow progress can be observed in the diary.
Categories: Inferno, Free Software
Saturday, 04 May 2024
Send your talks for Akademy NOW!
Akademy 2024 (the annual world summit for KDE) is happening in Würzburg, Saturday 7th – Thursday 12th September. (I hope you knew that)
First of all, if you're reading this and thinking, "Should i go to Akademy?"
The answer is [most probably] YES! Akademy has something for everyone, be it coders, translators, promoters, designers, enthusiasts, etc.
Now, with this out of the way, one of the many things that makes Akademy is the talks on the weekend, and you know who has something to say? *YOU*
Yes, *YOU*. I'm sure you've been working on something interesting, or have a great idea to share.
*YOU* may think that your idea is not that great or the things you work on are not interesting, but that's seldomly the case when someone explains me their "boring" thing they've been working on, i always think "Wow that's great".
Ok, so now that I've convinced you to send a talk proposal, when better than *TODAY* to send it?
Yes I know the Call for Participation is open until the 24 of May, but by sending it today you make sure you don't forget sending it later and also [more important for me] you help those of us in the Program Committee not to worry when the final date starts approaching and we don't have lots of talks yet because you all prefer sending talks on the very last minute.
So stop reading and send your talk today ;-)
Tuesday, 30 April 2024
Digitaliseringsproblemer kan løses med fri software
Digitaliseringsproblemer kan løses med fri/Open Source software
– åbent brev til partiernes IT-ordførere
FSFE Danmark v/Allan Dukat, Øjvind Fritjof Arnfred og Carsten Agger –
Om os i FSFE Danmark:
https://community.fsfe.org/t/handleplan-og-arbejdsgrundlag-for-fsfe-danmark/963
I slutningen af 2019 klagede en far til Datatilsynet over Helsingør Kommune, fordi hans otte-årige søn i sin folkeskole havde fået udleveret en Chromebook computer med en Google-konto uden faderens samtykke. Dette betød, som han anførte i klagen, at sønnens persondata ulovligt blev videregivet til Google.
Kommunernes Landsforening (KL) overtog forhandlingerne med Datatilsynet på vegne af de 53 kommuner, der benyttede Chromebook og Google.
Under hele forløbet undlod KL at overveje, om der var andre måder at tilrettelægge undervisningen på, men støttede ubetinget Helsingør Kommunes ret til at videregive elevernes persondata til Google. 4 år senere slog Datatilsynet i en endelig afgørelse fast, at skolernes brug af Chromebook og Google Workspace er ulovlig. KL’s indstilling var og er, at loven må laves om, som de skriver i deres pressemeddelelse:
”Hvis ikke regeringen kommer på banen, så er der sort skærm for titusindvis af skolebørn i mere end halvdelen af landets kommuner efter sommerferien. Det vil ramme skolerne hårdt. Vil vi virkelig det? Og der vil være tusindvis af ordblinde børn, som ikke kan få den nødvendige hjælp. Vil vi tillade det? Noget andet – og langt mere vidtrækkende – er, at der er skabt en kæmpe utryghed på en lang række andre velfærdsområder, hvor vi ikke ved, om vi er købt eller solgt.”
KL tager således skoleelevernes private data som gidsler i stedet for at undersøge lovlige alternativer. Det er nok også meget optimistisk af KL at tro, at Folketinget vil eller kan ændre loven på en måde, som strider mod det europæiske databeskyttelsesregulativ GDPR, som Datatilsynet har henvist til i deres afgørele.
I oktober 2023 kom det frem i pressen, at Københavns Kommune havde flere ulovlige kontrakter med deres IT-leverandører. Kommunens egen databeskyttelsesrådgiver skrev således i et notat fra april 2023:
»Københavns Kommune har på nuværende tidspunkt flere ulovlige kontrakter med leverandører og vil i den nærmeste fremtid have behov for at forny flere af disse kontrakter, fordi leverandørens løsning ikke udbydes af andre, og fordi systemet er nødvendigt for, at Københavns Kommune kan levere velfærdsydelser til borgere.«
Kilde: https://www.computerworld.dk/art/284811/koebenhavns-kommune-har-hundredevis-af-ulovlige-it-kontrakter
I november 2023 viste det sig, at regionerne fik uforudset høje udgifter til softwarelicenser. Priserne for centrale softwareløsninger steg betydeligt mere end inflationen, men da journalsystemerne simpelthen ikke kunne fungere uden, var der ikke andet at gøre end at betale regningen og finde pengene ved besparelser andre steder i regionerne.
I februar 2024 var det kommunernes tur til at beklage sig over betalingen for Microsofts officepakke i “skyen”.
Her er problemet, at både data og programmer ligger på Microsofts servere, så man ikke som i “gamle dage” kunne undlade at opgradere programmerne og fortsætte med at bruge de ældre udgaver. For hvis man ikke betaler, mister man adgangen til systemerne. Også her er sagsbehandlingen dybt afhængig af de ansattes officeprogrammer.
Ligeledes i februar 2024 lækker en hacker kildekode til programmer, som Netcompany har udviklet til adskillige danske myndigheder. Kildekoden afslører brugernavne og passwords, og eksperter ser eksempler på manglende adskillelse mellem programmer, konfiguration og runtime environment. Eftersom Netcompany ejer koden og ikke deler den med nogen, var der ingen, der vidste dette, før koden blev offentliggjort.
Kilde: https://www.version2.dk/artikel/hackere-laekker-kildekode-og-passwords-fra-netcompany-truer-den-danske-stat og flere efterfølgende artikler.
Hvordan kan man løse disse problemer?
Vi mener, at kernen i enhver digitaliseringsstrategi må være, at det offentlige altid beholder ejerskabet af og suveræniteten over deres IT-systemer.
Dette kan opnås ved at følge FSFEs slogan om “Public Money Public Code”, https://publiccode.eu/da/, der siger, at al software betalt af offentlige midler altid skal være fri software, også kendt som “open source”. Denne type software er kendetegnet ved, at modtageren altid har ret til at undersøge softwaren, som de selv synes; køre den med ethvert formål uden begrænsning; selv ændre og forbedre den; samt dele den med andre, også som de selv synes, med eller uden egne forbedringer.
Dette princip vender magtforholdet mellem kunde og producent om og betyder, at kunden altid kan tage produktet og få det videreudviklet og driftet hos en anden leverandør. Hvis det offentlige følger dette princip, betyder det netop også, at myndighederne og ingen andre har den fulde kontrol over deres løsninger og aldrig mere kan presses på pris og vilkår af urimelige leverandører.
Derfor bør alle udbud indeholde en klausul om, at al kode brugt i det offentlige og udviklet for offentlige midler skal være fri software/open source. Dette har virkelig mange fordele:
- Fri software/open source sikrer, at man aldrig er prisgivet en enkelt leverandør, da en anden leverandør altid vil kunne arbejde videre med systemet
- Når programmerne er fri software/open source, er der ingen binding til bestemte leverandørers datacentre – de kan placeres hvor som helst, hos en anden leverandør eller lokalt. Derved bliver konkurrencen mellem forskellige leverandører mere fri, idet man slipper for de kunstige monopoler, der i dag holder kunderne fanget i løsninger, der ikke er de bedste eller billigste på markedet.
- Når et program er betalt, kan alle dele af det offentlige bruge det, uden yderligere udgifter eller licenser – hvis én region eller ét land fx har fået udviklet et patientjournalsystem, kan alle regioner og lande frit bruge det uden at betale for den funktionalitet, der allerede er udviklet.
- Programmer, som er fri software/open source er grundlæggende sikrere, fordi udviklerne ved, at andre kan se hvad de laver, og der er flere til at finde fejlene.
Derudover ville det være oplagt at holde offentlige data inden for landets grænser.
Vi ved godt, at det vil være en lang og dyr proces at nå dertil, men vi er overbeviste om, at det på sigt vil være for det bedste, økonomisk såvel som sikkerhedsmæssigt, både af hensyn til databeskyttelsen og landets suverænitet, at indføre denne klausul, om at al offentlig software skal være fri software/open source.
Derfor bør processen startes øjeblikkeligt, og man kan passende kigge på OS2 – det offentlige digitaliseringsfællesskab, https://www.os2.eu/, for at finde inspiration til, hvordan det eventuelt kan gøres.
Til yderligere inspiration henvises til disse sider:
-
Hvad er fri software, https://media.fsfe.org/w/3bc6b0e6-fee3-4239-9ec7-dd0c1669b841
- German state ditches Windows, Microsoft Office for Linux and LibreOffice, https://www.theregister.com/2024/04/04/germanys_northernmost_state_ditches_windows/
- Free Software Foundation Europes hjemmeside, https://fsfe.org/
Kontakt
- Carsten Agger <agger at fsfe.org>, koordinator for Danmark og medlem af Free Software Foundation Europes bestyrelse og europæiske team, tlf. 2086 5010
- Allan Dukat <allan at fsfe.org>, aktivist og støttemedlem i FSFE Danmark
- Øjvind Fritjof Arnfred <slartibartfas at fsfe.org>, aktivist og støttemedlem i FSFE Danmark
Friday, 26 April 2024
A fast fileserver with FreeBSD, NVMEs, ZFS and NFS
I have a small server running in my flat that serves files locally via NFS and remotely via Nextcloud. This post documents the slightly overpowered upgrade of the hardware and subsequent performance / efficiency optimisations.
TL;DR
- I can fully saturate a 10Gbit LAN connection, achieving more than 1100 MiB/s throughput.
- I can perform a
zpool scrub
with 11 GiB/s, completing a 6.8TiB scrub in 11min. - Idle power usage can be brought down to 34W.
Old setup and requirements
What the server does:
- Serve files via NFS to
- my workstation (high traffic)
- a couple of Laptops (low traffic)
- the TV running Kodi (medium traffic)
- Host a Nextcloud which provides file storage, PIM etc. for a handful of people
Not a lot of compute is necessary, and I have tried to keep power usage low. The old hardware served me well really long:
- AMD 630 CPU
- 16GiB RAM
- 2+1 * 4TiB spinning disk RAIDZ1 with SSD ZIL (“write-cache”)
The main pain point was slow disk access resulting in poor performance when large files were read by the Nextcloud. Browsing through my photo collection via NFS was also very slow, because thumbnail generation needed to pull all the images. Furthermore, low speed meant that I was not doing as much on the remote storage as I would have liked (e.g. storing games), resulting in my workstation’s storage always running out. And I was just reaching the limits of my ZFS pool anyway, so it was time for an upgrade!
New setup
To get better I/O, I thought about switching from HDD to SSD, but then realised that SSD performance is very low compared to NVME performance, although the price difference is not that much. Also, NFS+ZFS leads to quite a bit of I/O, typically requiring the use of faster caching devices, further complicating the setup. Consequently, I decided to go for a pure NVME setup. Of course, the new server would also need 10GBit networking, so that I can use all that speed in the LAN!
This is the new hardware! I will discuss the details below.
Mainboard, CPU and RAM
The main requirement for the mainboard is to offer connectivity for four NVME disks. And to be prepared for the future, I would actually like 1-2 extra NVME slots. There are two ways to attach NVMEs to a motherboard:
- directly (“natively”)
- via an extension card that is plugged into a PCIexpress slot
Initially, I had assumed no mainboard would offer sufficient native slots, so I did a lot of research on option 2. The summery: it is quite messy. If you want to use a single extension card that hosts multiple NVMEs (which is required in this case), you need so called “bifurcation support” on the mainboard. This lets you e.g. put a PCIe x8 card with two NVME 4x disks into a PCIe 8x slot on the mainboard. However, this feature is really poorly documented,1 and and varies between mainboard AND CPU whether they support no bifurcation, only 8x → 4x4x or also 16x → 4x4x4x4x. The different PCIe versions and speeds, and the difference between the actually supported speed and the electrical interface add further complications.
In the end, I decided to not do any experiments and look for a board that natively supports a high number of NVME slots. For some reasons, this feature is very rare on AMD mainboards, so I switched to Intel (although actually I am a bit of an AMD fanboy). I probably could have gone with a board that has 5 slots, but I use hardware for a long time and wanted to be safe, so I took board that has 6 NVME slots (2 free slots):
None of the available boards had a proper2 10GBit network adaptor, so having a usable PCIe slot for a dedicated card was also a requirement. It is important to check whether PCIe slots can still be used when all NVME slots are occupied; sometimes they internally share the bandwidth. But for the above board this is not the case.
Important: To be able to boot FreeBSD on this board, you need to add the following to /boot/device.hints
:
hint.uart.0.disabled="1"
hint.uart.1.disabled="1"
For the CPU, I just went with something on the low TDP end of the current Intel CPU range, the Intel Core i3-12100T. Four cores + four threads was exactly what I was looking for, and 35W TDP sounded good. I paired that with some off-the-shelf 32GiB RAM kit.
Case, power supply & cooling
Strictly speaking a 2U case would have been sufficient, but I thought a 3U case might offer better air circulation. I ended up with the Gembird 19CC-3U-01. For unknown reasons, I chose a 2U horizontal CPU fan, instead of a 3U one. The latter would definitely have provided better airflow, but since the fan barely runs at all, it doesn’t make much of a difference.
I was unsuccessful in finding a good PSU that is super efficient in the average case of around 40W power usage but also covers spikes well above 100W, so I just chose the cheapest 300W one I could get :)
The case with everything in place.
The built in fans are very noisy. I chose to replace one of the intake fans with a spare one I had lying around and only connect one of the rear outtake fans. But I added an extra fan where the extension slots are to divert some airflow around the NIC—which otherwise gets quite warm. This should also blow some air over the NVME heatsinks! All fans can be regulated and fine-tuned from the BIOS of the mainboard which I totally recommend you do. At the current temperatures and average workloads the whole setup is almost silent.
Storage
Now, the fun begins. Since I needed more space than before, I clearly want a 3+1 x 4TiB RAIDZ1.
My goal was to be able to saturate a 10GBit connection (so get around 1GiB/s throughput) and still have the server be able to serve the Nextcloud without slowing down significantly. Currently the WAN upload is quite slow, but I hope to have fibre in the future. In any case, I thought that any modern NVME should be fast enough, because they all advertise speeds of multiple GiB/s.
Choice of disks
Anyway, I got two Crucial P3 Plus 4TB (which were on sale at Amazon for ~190€), as well as two Lexar NM790 4TB (which were also a lot cheaper than they are now). My assumption that that they were very comparable, was very wrong:
Disk | IOPS rand-read | IOPS read | IOPS write | MB/s read | MB/s write | “cat speed” MB/s |
---|---|---|---|---|---|---|
Crucial | 53,500 | 794,000 | 455,000 | 2,600 | 4,983 | ~700 |
Lexar | 53,700 | 796,000 | 456,000 | 4,578 | 5,737 | ~2,700 |
I used this fellow’s fio-script to
generate all columns except the last. The last column was generated by simply cat’ing a 10GiB file of random numbers to /dev/null
which
roughly corresponds to the read portion of copying a 4k movie file.
Since I had two disks each, I actually took the time to test all of them in different mainboard slots, but the results
were very consistent: in real-life tasks, the Crucial disk underperformed significantly, while the Lexar disks were
super fast.
I decided to return the Crucial disks and get two more by Lexar 😎
Disk encryption
I always store my data encrypted at rest. FreeBSD offers GELI block-level encryption (similar to LUKS on Linux). But OpenZFS also provides a dataset/filesystem-level encryption since a while. I previously used GELI, but I wanted to switch to ZFS native encryption, because it provides some advantages:
- Flexibility: I can choose later which datasets to encrypt; I can encrypt different datasets with different keys.
- Zero-knowledge backups: I can send incremental backups off-site that are received and fully integrated into the target pool without that server ever getting the decryption keys.
- Forward-compatibility: I can upgrade to better encryption algorithms later.
- Linux-compatibility: I can import the existing pool in a Linux environment for debugging or benchmarking.
However, I had also heard that ZFS native encryption was slower, so I decided to do some benchmarks:
Disk | IOPS rand-read | IOPS read | IOPS write | MB/s read | MB/s write | “cat speed” MB/s |
---|---|---|---|---|---|---|
no encryption | 54,700 | 809,000 | 453,000 | 4,796 | 5,868 | 2,732 |
geli-aes-256-xts | 40,000 | 793,000 | 446,000 | 3,332 | 3,334 | 952 |
zfs-enc-aes-256-gcm | 26,100 | 513,000 | 285,000 | 3,871 | 4,648 | 2,638 |
zfs-enc-aes-128-gcm | 29,300 | 532,000 | 353,000 | 3,971 | 4,794 | 2,631 |
Interestingly, GELI3 performs much better on the IOPS, but much worse on throughput, especially on our real-life test case. Maybe some smart person knows the reason for this, but I took this benchmark as an assurance that going with native encryption was the right choice.4 One reason for the good performance of the native encryption seems to be that it makes use of the CPU’s avx2 extensions.
At this point, I feel like I do need to warn people about some ZFS encryption related issues that I learned about later. Please read this. I have had no problems to date, but make up your own mind.
RaidZ1
recordsize | compr. | encrypt | IOPS rand-read | IOPS read | IOPS write | MB/s read | MB/s write | “cat speed” MB/s |
---|---|---|---|---|---|---|---|---|
128 KiB | off | off | 50,000 | 869,000 | 418,000 | 3,964 | 5,745 | 2,019 |
128 KiB | on | off | 49,800 | 877,000 | 458,000 | 3,929 | 4,654 | 1,448 |
128 KiB | off | aes128 | 26,300 | 484,000 | 230,000 | 3,589 | 5,331 | 2,142 |
128 KiB | on | aes128 | 27,400 | 501,000 | 228,000 | 3,510 | 3,927 | 2,120 |
These are the numbers after creation of the RAIDZ1 based pool. They are quite similar to the numbers measured before.
The impact of encryption on IOPS is clearly visible, less so on sequential read/write throughput.
Compression seems to impact write throughput but not read throughput which is expected for zstd
. It is unclear why
“cat speed” is lower here.
recordsize | compr. | encrypt | IOPS rand-read | IOPS read | IOPS write | MB/s read | MB/s write | “cat speed” MB/s |
---|---|---|---|---|---|---|---|---|
1 MiB | off | off | 7,235 | 730,000 | 404,000 | 3,686 | 3,548 | 2,142 |
1 MiB | on | off | 7,112 | 800,000 | 470,000 | 3,624 | 3,447 | 2,064 |
1 MiB | off | aes128 | 3,259 | 497,000 | 258,000 | 3,029 | 3,422 | 2,227 |
1 MiB | on | aes128 | 3,697 | 506,000 | 249,000 | 3,137 | 3,361 | 2,237 |
Many optimisation guides suggest setting the zfs recordsize
to 1 MiB for most use-cases, especially storage of media
files.
But this seems to drastically penalise random read IOPS while providing little to no benefit in the sequential
read/write scenarios. This is actually a bit surprising and I will need to investigate this more.
Is it perhaps because NVMEs are good at parallel access and therefor suffer less from fragmentation anyway?
In any case, the main take away message is that overall read and write throughputs are over 3,000 MiB/s in the synthetic case and over 2,000 MiB/s in the manual case, which is great.
Other disk performance metrics
Operation | Speed [MiB/s] |
---|---|
Copying 382 GiB between two datasets (both enc+comp) | 1,564 |
Copying 505 GiB between two datasets (both enc+comp) | 800 |
zfs scrub of the full pool |
11,000 |
These numbers further illustrate some real world use-cases. It’s interesting to see the difference between the first two, but it’s also important to keep in mind that this is reading and writing at the same time. Maybe some internal caches are exhausted after a while? I didn’t debug these numbers further, but I think the speed is quite good after such a long read/write.
More interesting is the speed for scrubbing, and, yes, I have checked this a couple of times. A scrub of 6.84TiB happens in 10m - 11m, which is pretty amazing, I think, considering that it is reading the data and calculating checksums. I am assuming that sequential read is just very fast and that access to the different disks happens in parallel. The checksum implementation is apparently also avx2 optimised.
LAN
Network adapter
Based on recommendations, I decided to buy an Intel card. Cheaper 10GBit network cards are available from Marvell/Aquantia, but the driver support in FreeBSD is poor, and the performance is supposedly also not close to that of Intel.
Many people suggested I go for SFP+ (fibre) instead of 10GBase-T (copper), but I already have CAT7 cables in my flat. While I could have used fibre purely for connecting the server to the switch (and this would likely save some power), I would have had to buy a new switch and the options were just not economical—I already have a switch with two 10GBase-T ports which I had bought for exactly this setup.
The cheapest Intel 10GBase-T card out there is the X540 which is quite old and available on Amazon for around 80€. I bought two of those (one for the server and one for the workstation). More modern cards are supposedly more energy efficient, but also a lot more expensive.5
NFS Performance
On server and client, I set:
kern.ipc.maxsockbuf=4737024
in/etc/sysctl.conf
mtu 9000 media 10gbase-t
in the/etc/rc.conf
(ifconfig)
Only on the server:
nfs_server_maxio="1048576"
in/etc/rc.conf
Only on the client:
nfsv4,nconnect=8,readahead=8
as the mount options for the nfs mount.vfs.maxbcachebuf=1048576
in/boot/loader.conf
(not sure any more if this makes a difference).
These settings allow larger buffers and increase the amount of readahead. This favours large sequential reads/writes over small random reads/writes.
The full options on the client end up being:
# nfsstat -m
X.Y.Z.W:/ on /mnt/server
nfsv4,minorversion=2,tcp,resvport,nconnect=8,hard,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=1048576,wsize=1048576,readdirsize=1048576,readahead=8,wcommitsize=16777216,timeout=120,retrans=2147483647
I use NFS4 for my workstation and NFS3 for everyone else. I have performed no benchmarks on NFS3, but I see no reason why it would be slower.
IOPS rand-read | IOPS read | IOPS write | MB/s read | MB/s write | “cat speed” MB/s |
---|---|---|---|---|---|
283 | 292,000 | 33,200 | 1,156 | 594 | 1,164 |
This benchmark was performed on a dataset with 1M recordsize, encryption, but no compression.
Random read IOPS are pretty bad, and I see a strong correlation here to the rsize
(if I halve it, I double the IOPS; not shown in table).
It’s possible that every 4KiB read actually triggers a 1MiB read in NFS which would explain this.
On the other hand, the sequential read and write performance is pretty good with synthetic and real world read speeds
being very close to the theoretical maximum of the 10GBit connection.
One thing to keep in mind: The blocksize when reading has a very strong impact on the performance. This
can be seen when using dd
with different bs
arguments. Of course, 1MiB is optimal if that is also used by NFS, and
cat
seems to do this. However, cp
does not which results in a much slower performance than if using dd if=.. of=.. bs=1M
.
I have done measurements with plain nc
over the wire (also reaching 1,160 MiB/s) and iperf3
which achieves 1,233 MiB/s just below the 1,250 MiB/s equivalent of 10Gbit.
Power consumption and thermals
For a computer running 24/7 in my flat, power consumption is of course important. I bought a device to measure power consumption at the outlet to get an accurate picture.
idle
Because the computer is idle most of the time, optimising idle power usage is most important.
Change | W/h |
---|---|
default | 50 |
*_cx_lowest="Cmax" |
45 |
disable WiFi and BT | 42 |
media 10gbase-t |
45 |
machdep.hwpstate_pkg_ctrl=0 |
41 |
turn on chassis fans | 42 |
ASPM modes to L0s+L1 / enabled | 34 |
I assume that the same setup on Linux would be slightly more efficient, but 34W in idle is acceptable.
Clearly, the most impactful changes were:
- Activating ASPM for the PCIe devices in the BIOS.
- Adding
performance_cx_lowest="Cmax"
andeconomy_cx_lowest="Cmax"
to/etc/rc.conf
. - Adding
machdep.hwpstate_pkg_ctrl=0
to/boot/loader.conf
.
You can find online resources on what these options do. You might need to update the BIOS to be able to disable
WiFi and Bluetooth devices completely. You can also use hints in the /boot/device.hints
, but this doesn’t save
as much power.
Using 10GBase-T speed on the network device (instead of 1000Base-T) unfortunately increases power usage notably, but there is nothing I could find to mitigate this.
Things that are often recommended but that did not help me (at least not in idle):
- NVME power states (more on this below)
- lower values for
sysctl dev.hwpstate_intel.*.epp
(more on this below) hw.pci.do_power_nodriver=3
idle temperatures | °C |
---|---|
CPU | 37-40 |
NVMEs | 52-55 |
The latter was particularly interesting, because I had heard that newer NVMEs, and especially those by Lexar get very warm. It should be noted though, that the mainboard comes with a large heatsink that covers all NVMEs.
under load
The only “load test” that I performed was a scrub of the pool. Since this puts stress on the NVMEs and also the CPUs, it should be at least indicative of how things are going.
during zpool scrub |
°C |
---|---|
CPU | 55-59 |
NVMEs | 69-75 |
The power usage fluctuates between 85W and 98W. I think all of these values are acceptable.
NVME power state hint | scrub speed GiB/s | W/h |
---|---|---|
0 (default) | 11 | < 100 |
1 | 8 | < 93 |
2 | 4 | < 70 |
You can use nvmecontrol
to tell the NVME disks to save energy. More information on this here
and here.
I was surprised that all of this works reliably on FreeBSD, but it does! The man-page is not great though. Simply
call nvmecontrol power -p X nvmeYns1
to set the hint to X on device Y, if desired. Note that this needs to be repeated after
every reboot.
dev.hwpstate_intel.*.epp |
scrub speed GiB/s | W/h |
---|---|---|
50 (default) | 11.0 | < 100 |
100 | 3.3 | < 60 |
You can use the dev.hwpstate_intel.*.epp
sysctls for you cores to tune the eagerness of that core to scale up with
higher number meaning less eagerness.
In the end, I decided not to apply any of these “under load optimisations”. It is just very difficult, because, as shown, all optimisations that reduce watts per time also increase time. I am not certain of any good ways to quantify this, but it feels like keeping the system at 70W for 30min instead of 100W for 10min, is not really worth it. And I kind of also want the system to be fast, that’s why I spent so much money on it 🙃
The CPU does have a cTDP mode that can be activated via the BIOS and which is “worth it”, according to some articles I have read. I might give this a try in the future.
Final remarks
What a ride! I spent a lot of time optimising and benchmarking this and I am quite happy with the outcome. I am able to exhaust the 10GBit LAN connection completely, and still have resources left on the server :)
Thanks to the people at www.bsdforen.de who had quite a few helpful suggestions.
If you see anything that I missed, or have suggestions on how to improve this setup, let me know in the comments!
Footnotes
-
With ASUS being the only exception. ↩︎
-
Proper in this context means well-supported by FreeBSD and with a good performance. Usually, that means an Intel NIC. Unfortunately all the modern boards come Marvell/Aquantia AQtion adaptors which are not well-supported by FreeBSD. ↩︎
-
The geli device was created with:
geli init -b -s4096 -l256
↩︎ -
I wanted to perform all these tests with Linux as well, but I ran out of time 🙈 ↩︎
-
I did try a slightly more more modern adapter with Intel 82599EN chip. This is a SFP+ chip, but I found an adaptor with built-in 10GBase-T for around 150€. It ended up having some driver issues (you needed to plug and unplug the CAT cable for the device to go UP), and it used more energy than the X540, so I sent it back. ↩︎
Planet FSFE (en): RSS 2.0 | Atom | FOAF |
Albrechts Blog Alessandro's blog Andrea Scarpino's blog André Ockers on Free Software Bela's Internship Blog Bernhard's Blog Bits from the Basement Blog of Martin Husovec Bobulate Brian Gough’s Notes Chris Woolfrey — FSFE UK Team Member Ciarán’s free software notes Colors of Noise - Entries tagged planetfsfe Communicating freely Daniel Martí's blog David Boddie - Updates (Full Articles) ENOWITTYNAME English Planet – Dreierlei English – Alessandro at FSFE English – Alina Mierlus – Building the Freedom English – Being Fellow #952 of FSFE English – Blog English – FSFE supporters Vienna English – Free Software for Privacy and Education English – Free speech is better than free beer English – Jelle Hermsen English – Nicolas Jean's FSFE blog English – Paul Boddie's Free Software-related blog English – The Girl Who Wasn't There English – Thinking out loud English – Viktor's notes English – With/in the FSFE English – gollo's blog English – mkesper's blog English – nico.rikken’s blog Escape to freedom Evaggelos Balaskas - System Engineer FSFE interviews its Fellows FSFE – Frederik Gladhorn (fregl) FSFE – Matej's blog Fellowship News Free Software & Digital Rights Noosphere Free Software on Carmen Bianca BAKKER Free Software with a Female touch Free Software – Torsten's Thoughtcrimes Free Software – hesa's Weblog Free as LIBRE Free, Easy and Others FreeSoftware – egnun's blog From Out There Giacomo Poderi Green Eggs and Ham Handhelds, Linux and Heroes HennR’s FSFE blog Henri Bergius Karsten on Free Software Losca MHO Mario Fux Matthias Kirschner's Web log - fsfe Max Mehl (English) Michael Clemens Myriam's blog Mäh? Nice blog Nikos Roussos - opensource Posts on Hannes Hauswedell Pressreview Rekado Riccardo (ruphy) Iaconelli – blog Saint’s Log TSDgeos' blog Tarin Gamberini Technology – Intuitionistically Uncertain The trunk Thomas Løcke Being Incoherent Told to blog - Entries tagged fsfe Tonnerre Lombard Vincent Lequertier's blog Vitaly Repin. Software engineer's blog Weblog Weblog Weblog Weblog Weblog Weblog a fellowship ahead agger's Free Software blog anna.morris's blog ayers's blog bb's blog blog en – Florian Snows Blog en – PB's blog en – rieper|blog english on Björn Schießle - I came for the code but stayed for the freedom english – Davide Giunchi english – Torsten's FSFE blog foss – vanitasvitae's blog free software blog freedom bits freesoftware – drdanzs blog fsfe – Thib's Fellowship Blog julia.e.klein’s blog marc0s on Free Software mina86.com (In English) pichel’s blog planet-en – /var/log/fsfe/flx polina's blog softmetz' anglophone Free Software blog stargrave's blog tobias_platen's blog tolld's blog wkossen’s blog yahuxo’s blog