Free Software, Free Society!
Thoughts of the FSFE Community (English)

Friday, 21 April 2023

Migrate docker images to another disk

There is some confusion on which is the correct way to migrate your current/local docker images to another disk. To reduce this confusion, I will share my personal notes on the subject.


I replaced a btrfs raid-1 1TB storage with another btrfs raid-1 4TB setup. So 2 disks out, 2 new disks in. I also use luks, so all my disks are encrypted with random 4k keys before btrfs on them. There is -for sure- a write-penalty with this setup, but I am for data resilience - not speed.


These are my local docker images

docker images -a
REPOSITORY        TAG           IMAGE ID         CREATED      SIZE
golang            1.19          b47c7dfaaa93     5 days   ago   993MB
archlinux         base-devel    a37dc5345d16     6 days   ago   764MB
archlinux         base          d4e07600b346    4 weeks  ago   418MB
ubuntu            22.04         58db3edaf2be    2 months ago   77.8MB
centos7           ruby          28f8bde8a757    3 months ago   532MB
ubuntu            20.04         d5447fc01ae6    4 months ago   72.8MB
ruby              latest        046e6d725a3c    4 months ago   893MB
alpine            latest        49176f190c7e    4 months ago   7.04MB
bash              latest        018f8f38ad92    5 months ago   12.3MB
ubuntu            18.04         71eaf13299f4    5 months ago   63.1MB
centos            6             5bf9684f4720   19 months ago   194MB
centos            7             eeb6ee3f44bd   19 months ago   204MB
centos            8             5d0da3dc9764   19 months ago   231MB
ubuntu            16.04         b6f507652425   19 months ago   135MB
3bal/centos6-eol  devtoolset-7  ff3fa1a19332    2 years  ago   693MB
3bal/centos6-eol  latest        aa2256d57c69    2 years  ago   194MB
centos6           ebal          d073310c1ec4    2 years  ago   3.62GB
3bal/arch         devel         76a20143aac1    2 years  ago   1.02GB
cern/slc6-base    latest        63453d0a9b55    3 years  ago   222MB

Yes, I am still using centos6! It’s stable!!

docker save - docker load

Reading docker’s documentation, the suggested way is docker save and docker load. Seems easy enough:

docker save --output busybox.tar busybox
docker load < busybox.tar.gz

which is a lie!

docker prune

before we do anything with the docker images, let us clean up the garbages

sudo docker system prune

docker save - the wrong way

so I used the ImageID as a reference:

docker images -a  | grep -v ^REPOSITORY | awk '{print "docker save -o "$3".tar "$3}'

piped out through a bash shell | bash -x
and got my images:

$ ls -1


docker daemon

I had my docker images on tape-archive (tar) format. Now it was time to switch to my new btrfs storage. In order to do that, the safest way is my tweaking the

and I added the data-root section

    "dns": [""],
    "data-root": "/mnt/WD40PURZ/var_lib_docker"

I will explain var_lib_docker in a bit, stay with me.
and restarted docker

sudo systemctl restart docker

docker load - the wrong way

It was time to restore aka load the docker images back to docker

ls -1 | awk '{print "docker load --input "$1".tar"}'

docker load --input 33a093dd9250.tar
docker load --input b47c7dfaaa93.tar
docker load --input 16eed3dc21a6.tar
docker load --input d4e07600b346.tar
docker load --input 58db3edaf2be.tar
docker load --input 28f8bde8a757.tar
docker load --input 382715ecff56.tar
docker load --input d5447fc01ae6.tar
docker load --input 046e6d725a3c.tar
docker load --input 49176f190c7e.tar
docker load --input 018f8f38ad92.tar
docker load --input 71eaf13299f4.tar
docker load --input 5bf9684f4720.tar
docker load --input eeb6ee3f44bd.tar
docker load --input 5d0da3dc9764.tar
docker load --input b6f507652425.tar
docker load --input ff3fa1a19332.tar
docker load --input aa2256d57c69.tar
docker load --input d073310c1ec4.tar
docker load --input 76a20143aac1.tar
docker load --input 63453d0a9b55.tar

I was really happy, till I saw the result:

# docker images -a

<none>       <none>    b47c7dfaaa93   5 days ago      993MB
<none>       <none>    a37dc5345d16   6 days ago      764MB
<none>       <none>    16eed3dc21a6   2 weeks ago     65.5MB
<none>       <none>    d4e07600b346   4 weeks ago     418MB
<none>       <none>    58db3edaf2be   2 months ago    77.8MB
<none>       <none>    28f8bde8a757   3 months ago    532MB
<none>       <none>    382715ecff56   3 months ago    705MB
<none>       <none>    d5447fc01ae6   4 months ago    72.8MB
<none>       <none>    046e6d725a3c   4 months ago    893MB
<none>       <none>    49176f190c7e   4 months ago    7.04MB
<none>       <none>    018f8f38ad92   5 months ago    12.3MB
<none>       <none>    71eaf13299f4   5 months ago    63.1MB
<none>       <none>    5bf9684f4720   19 months ago   194MB
<none>       <none>    eeb6ee3f44bd   19 months ago   204MB
<none>       <none>    5d0da3dc9764   19 months ago   231MB
<none>       <none>    b6f507652425   19 months ago   135MB
<none>       <none>    ff3fa1a19332   2 years ago     693MB
<none>       <none>    aa2256d57c69   2 years ago     194MB
<none>       <none>    d073310c1ec4   2 years ago     3.62GB
<none>       <none>    76a20143aac1   2 years ago     1.02GB
<none>       <none>    63453d0a9b55   3 years ago     222MB


then after a few minutes of internet search, I’ve realized that if you use the ImageID as a reference point in docker save, you will not get these values !!!!

and there is no reference here:

Removed everything , removed the data-root from /etc/docker/daemon.json and started again from the beginning

docker save - the correct way

docker images -a  | grep -v ^REPOSITORY | awk '{print "docker save -o "$3".tar "$1":"$2""}' | sh -x


+ docker save -o b47c7dfaaa93.tar golang:1.19
+ docker save -o a37dc5345d16.tar archlinux:base-devel
+ docker save -o d4e07600b346.tar archlinux:base
+ docker save -o 58db3edaf2be.tar ubuntu:22.04
+ docker save -o 28f8bde8a757.tar centos7:ruby
+ docker save -o 382715ecff56.tar gitlab/gitlab-runner:ubuntu
+ docker save -o d5447fc01ae6.tar ubuntu:20.04
+ docker save -o 046e6d725a3c.tar ruby:latest
+ docker save -o 49176f190c7e.tar alpine:latest
+ docker save -o 018f8f38ad92.tar bash:latest
+ docker save -o 71eaf13299f4.tar ubuntu:18.04
+ docker save -o 5bf9684f4720.tar centos:6
+ docker save -o eeb6ee3f44bd.tar centos:7
+ docker save -o 5d0da3dc9764.tar centos:8
+ docker save -o b6f507652425.tar ubuntu:16.04
+ docker save -o ff3fa1a19332.tar 3bal/centos6-eol:devtoolset-7
+ docker save -o aa2256d57c69.tar 3bal/centos6-eol:latest
+ docker save -o d073310c1ec4.tar centos6:ebal
+ docker save -o 76a20143aac1.tar 3bal/arch:devel
+ docker save -o 63453d0a9b55.tar cern/slc6-base:latest

docker daemon with new data point

    "dns": [""],
    "data-root": "/mnt/WD40PURZ/var_lib_docker"

restart docker

sudo systemctl restart docker

docker load - the correct way

ls -1 | awk '{print "docker load --input "$1}'

and verify -moment of truth-

$ docker images -a
REPOSITORY        TAG           IMAGE         ID  CREATED  SIZE
archlinux         base-devel    33a093dd9250  3   days     ago   764MB
golang            1.19          b47c7dfaaa93  8   days     ago   993MB
archlinux         base          d4e07600b346  4   weeks    ago   418MB
ubuntu            22.04         58db3edaf2be  2   months   ago   77.8MB
centos7           ruby          28f8bde8a757  3   months   ago   532MB
gitlab/gitlab-runner ubuntu     382715ecff56  4   months   ago   705MB
ubuntu            20.04         d5447fc01ae6  4   months   ago   72.8MB
ruby              latest        046e6d725a3c  4   months   ago   893MB
alpine            latest        49176f190c7e  4   months   ago   7.04MB
bash              latest        018f8f38ad92  5   months   ago   12.3MB
ubuntu            18.04         71eaf13299f4  5   months   ago   63.1MB
centos            6             5bf9684f4720  19  months   ago   194MB
centos            7             eeb6ee3f44bd  19  months   ago   204MB
centos            8             5d0da3dc9764  19  months   ago   231MB
ubuntu            16.04         b6f507652425  19  months   ago   135MB
3bal/centos6-eol  devtoolset-7  ff3fa1a19332  2   years    ago   693MB
3bal/centos6-eol  latest        aa2256d57c69  2   years    ago   194MB
centos6           ebal          d073310c1ec4  2   years    ago   3.62GB
3bal/arch         devel         76a20143aac1  2   years    ago   1.02GB
cern/slc6-base    latest        63453d0a9b55  3   years    ago   222MB

success !

btrfs mount point

Now it is time to explain the var_lib_docker

but first , let’s verify ST1000DX002 mount point with WD40PURZ

$ sudo ls -l /mnt/ST1000DX002/var_lib_docker/

total 4
drwx--x--- 1 root root  20 Nov 24  2020 btrfs
drwx------ 1 root root  20 Nov 24  2020 builder
drwx--x--x 1 root root 154 Dec 18  2020 buildkit
drwx--x--x 1 root root  12 Dec 18  2020 containerd
drwx--x--- 1 root root   0 Apr 14 19:52 containers
-rw------- 1 root root  59 Feb 13 10:45 engine-id
drwx------ 1 root root  10 Nov 24  2020 image
drwxr-x--- 1 root root  10 Nov 24  2020 network
drwx------ 1 root root  20 Nov 24  2020 plugins
drwx------ 1 root root   0 Apr 18 18:19 runtimes
drwx------ 1 root root   0 Nov 24  2020 swarm
drwx------ 1 root root   0 Apr 18 18:32 tmp
drwx------ 1 root root   0 Nov 24  2020 trust
drwx-----x 1 root root 568 Apr 18 18:19 volumes
$ sudo ls -l /mnt/WD40PURZ/var_lib_docker/

total 4
drwx--x--- 1 root root  20 Apr 18 16:51 btrfs
drwxr-xr-x 1 root root  14 Apr 18 17:46 builder
drwxr-xr-x 1 root root 148 Apr 18 17:48 buildkit
drwxr-xr-x 1 root root  20 Apr 18 17:47 containerd
drwx--x--- 1 root root   0 Apr 14 19:52 containers
-rw------- 1 root root  59 Feb 13 10:45 engine-id
drwxr-xr-x 1 root root  20 Apr 18 17:48 image
drwxr-xr-x 1 root root  24 Apr 18 17:48 network
drwxr-xr-x 1 root root  34 Apr 18 17:48 plugins
drwx------ 1 root root   0 Apr 18 18:36 runtimes
drwx------ 1 root root   0 Nov 24  2020 swarm
drwx------ 1 root root  48 Apr 18 18:42 tmp
drwx------ 1 root root   0 Nov 24  2020 trust
drwx-----x 1 root root  70 Apr 18 18:36 volumes

var_lib_docker is actually a btrfs subvolume that we can mount it on our system

$ sudo btrfs subvolume show /mnt/WD40PURZ/var_lib_docker/

        Name:                   var_lib_docker
        UUID:                   5552de11-f37c-4143-855f-50d02f0a9836
        Parent UUID:            -
        Received UUID:          -
        Creation time:          2023-04-18 16:25:54 +0300
        Subvolume ID:           4774
        Generation:             219588
        Gen at creation:        215579
        Parent ID:              5
        Top level ID:           5
        Flags:                  -
        Send transid:           0
        Send time:              2023-04-18 16:25:54 +0300
        Receive transid:        0
        Receive time:           -

We can use the subvolume id for that:

mount -o subvolid=4774 LABEL="WD40PURZ" /var/lib/docker/

So /var/lib/docker/ path on our rootfs, is now a mount point for our BTRFS raid-1 4TB storage and we can remove the data-root declaration from /etc/docker/daemon.json and restart our docker service.

That’s it !

Tag(s): docker, btrfs

Saturday, 08 April 2023

Continuing Explorations into Filesystems and Paging with L4Re

Towards the end of last year, I spent a fair amount of time trying to tidy up and document the work I had been doing on integrating a conventional filesystem into the L4 Runtime Environment (or L4Re Operating System Framework, as it now seems to be called). Some of that effort was purely administrative, such as giving the work a more meaningful name and changing references to the naming in various places, whereas other aspects were concerned with documenting mundane things like how the software might be obtained, built and used. My focus had shifted somewhat towards sharing the work and making it slightly more accessible to anyone who might be interested (even if this is probably a very small audience).

Previously, in seeking to demonstrate various mechanisms such as the way programs might be loaded and run, with their payloads paged into memory on demand, I had deferred other work that I felt was needed to make the software framework more usable. For example, I was not entirely happy with the way that my “client” library for filesystem access hid the underlying errors, making troubleshooting less convenient than it could be. Instead of perpetuating the classic Unix “errno” practice, I decided to give file data structures their own error member to retain any underlying error, meaning that a global variable would not be involved in any error reporting.

Other matters needed attending to, as well. Since acquiring a new computer in 2020 based on the x86-64 architecture, the primary testing environment for this effort has been a KVM/QEMU instance invoked by the L4Re build process. When employing the same x86-64 architecture for the instance as the host system, the instance should in theory be very efficient, but for some reason the startup time of such x86-64 instances is currently rather long. This was not the case at some point in the past, but having adopted the Git-based L4Re distribution, this performance regression made an appearance. Maybe at some stage in the future I will discover why it sits there for half a minute spinning up at the “Booting ROM” stage, but for now a reasonable workaround is to favour QEMU instances for other architectures when testing my development efforts.

Preserving Portability

Having long been aware of the necessity of software portability, I have therefore been testing the software in QEMU instances emulating the classic 32-bit x86 architecture as well as MIPS32, in which I have had a personal interest for several years. Surprisingly, testing on x86 revealed a few failures that were not easily explained, but I eventually tracked them down to interoperability problems with the L4Re IPC library, where that library was leaving parts of IPC message values uninitialised and causing my own IPC library to misinterpret the values being sent. This investigation also led me to discover that the x86 Application Binary Interface is rather different in character to the ABI for other architectures. On those other architectures, the alignment of members in structures (and of parameters in parameter lists) needs to be done more carefully due to the way values in memory are accessed. On x86, meanwhile, it seems that values of different sizes can be more readily packed together.

In any case, I came to believe that the L4Re IPC library is not following the x86 ABI specification in the way IPC messages are prepared. I did wonder whether this was deliberate, but I think that it is actually inadvertent. One of my helpful correspondents confirmed that there was indeed a discrepancy between the L4Re code and the ABI, but nothing came of any enquiries into the matter, so I imagine that in any L4Re systems deployed on x86 (although I doubt that there can be many), the use of the L4Re code on both sides of any given IPC transaction manages to conceal this apparent deficiency. The consequence for me was that I had to introduce a workaround in the cases where my code needs to interact with various existing L4Re components.

Several other portability changes were made to resolve a degree of ambiguity around the sizes of various types. This is where the C language family and various related standards and technologies can be infuriating, with care required when choosing data types and then using these in conjunction with libraries that might have their own ideas about which types should be used. Although there probably are good reasons for some types to be equivalent to a “machine word” in size, such types sit uncomfortably with types of other, machine-independent sizes. I am sure I will have to revisit these choices over and over again in future.

Enhancing Component Interface Descriptions

One thing I had been meaning to return to was the matter of my interface description language (IDL) tool and its lack of support for composing interfaces. For example, a component providing file content might expose several different interfaces for file operations, dataspace operations, and so on. These compound interfaces had been defined by specifying arguments for each invocation of the IDL tool that indicate all the interfaces involved, and thus the knowledge of each compound interface ended up being encoded as definitions within Makefiles like this:

mapped_file_object_INTERFACES = dataspace file flush mapped_file notification

A more natural approach involved defining these interfaces in the interface description language itself, but this was going to require putting in the effort to extend the tool, which would not be particularly pleasant, being written in C using Flex and Bison.

Eventually, I decided to just get on with remedying the situation, adding the necessary tool support, and thus tidying up and simplifying the Makefiles in my L4Re build system package. This did raise the complexity level in the special Makefiles provided to support the IDL tool – nothing in the realm of Makefiles is ever truly easy – but it hopefully confines such complexity out of sight and keeps the main project Makefiles as concise as can reasonably be expected. For reference, here is how a file component interface looks with this new tool support added:

interface MappedFileObject composes Dataspace, File, Flush, MappedFile, Notification;

And for reference, here is what one of the constituent interfaces looks like:

interface Flush
  /* Flush data and update the size, if appropriate. */

  [opcode(5)] void flush(in offset_t populated_size, out offset_t size);

I decided to diverge from previous languages of this kind and to use “composes” instead of language like “inherits”. These compound interface descriptions deliberately do not seek to combine interfaces in a way that entirely resembles inheritance as supported by various commonly used programming languages, and an interface composing other interfaces cannot also add operations of its own: it can merely combine other interfaces. The main reason for such limitations is the deliberate simplicity or lack of capability of the tool: it only really transcribes the input descriptions to equivalent forms in C or C++ and neglects to impose many restrictions of its own. One day, maybe I will revisit this and at least formalise these limitations instead of allowing them to emerge from the current state of the implementation.

A New Year

I had hoped to deliver something for broader perusal late last year, but the end of the year arrived and with it some intriguing but increasingly time-consuming distractions. Having written up the effective conclusion of those efforts, I was able to turn my attention to this work again. To start with, that involved reminding myself where I had got to with it, which underscores the need for some level of documentation, because documentation not only communicates the nature of a work to others but it also communicates it to one’s future self. So, I had to spend some time rediscovering the finer detail and reminding myself what the next steps were meant to be.

My previous efforts had demonstrated the ability to launch new programs from my own programs, reproducing some of what L4Re already provides but in a form more amenable to integrating with my own framework. If the existing L4Re code had been more obviously adaptable in a number of different ways throughout my long process of investigation and development for it, I might have been able to take some significant shortcuts and save myself a lot of effort. I suppose, however, that I am somewhat wiser about the technologies and techniques involved, which might be beneficial in its own way. The next step, then, was to figure out how to detect and handle the termination of programs that I had managed to launch.

In the existing L4Re framework, a component called Ned is capable of launching programs, although not being able to see quite how I might use it for my own purposes – that being to provide a capable enough shell environment for testing – had led me along my current path of development. It so happens that Ned supports an interface for “parent” tasks that is used by created or “child” tasks, and when a program terminates, the general support code for the program that is brought along by the C library includes the invocation of an operation on this parent interface before the program goes into a “wait forever” state. Handling this operation and providing this interface seemed to be the most suitable approach for replicating this functionality in my own code.

Consolidation and Modularisation

Before going any further, I wanted to consolidate my existing work which had demonstrated program launching in a program written specifically for that purpose, bringing along some accompanying abstractions that were more general in nature. First of all, I decided to try and make a library from the logic of the demonstration program I had written, so that the work involved in setting up the environment and resources for a new program could be packaged up and re-used. I also wanted the functionality to be available through a separate server component, so that programs wanting to start other programs would not need to incorporate this functionality but could instead make a request to this separate “process server” component to do the work, obtaining a reference to the new program in response.

One might wonder why one might bother introducing a separate component to start programs on another program’s behalf. As always when considering the division of functionality between components in a microkernel-based system, it is important to remember that components can have different configurations that afford them different levels of privilege within a system. We might want to start programs with one level of privilege from other programs with a different level of privilege. Another benefit of localising program launching in one particular component is that it might provide an overview of such activities across a number of programs, thus facilitating support for things like job and process control.

Naturally, an operating system does not need to consolidate all knowledge about running programs or processes in one place, and in a modular microkernel-based system, there need not even be a single process server. In fact, it seems likely that if we preserve the notion of a user of the system, each user might have their own process server, and maybe even more than one of them. Such a server would be configured to launch new programs in a particular way, having access only to resources available to a particular user. One interesting possibility is that of being able to run programs provided by one filesystem that then operate on data provided by another filesystem. A program would not be able to see the filesystem from which it came, but it would be able to see the contents of a separate, designated filesystem.

Region Mapper Deficiencies

A few things conspired to make the path of progress rather less direct than it might have been. Having demonstrated the launching of trivial programs, I had decided to take a welcome break from the effort. Returning to the effort, I decided to test access to files served up by my filesystem infrastructure, and this caused programs to fail. In order to support notification events when accessing files, I employ a notification thread to receive such events from other components, but the initialisation of threading in the C library was failing. This turned out to be due to the use of a region mapper operation that I had not yet supported, so I had to undertake a detour to implement an appropriate data structure in the region mapper, which in C++ is not a particularly pleasant experience.

Later on, the region mapper caused me some other problems. I had neglected to implement the detach operation, which I rely on quite heavily for my file access library. Attempting to remedy these problems involved reacquainting myself with the region mapper interface description which is buried in one of the L4Re packages, not to be confused with plenty of other region mapper abstractions which do not describe the actual interface employed by the IPC mechanism. The way that L4Re has abandoned properly documented interface descriptions is very annoying, requiring developers to sift through pages of barely commented code and to be fully aware of the role of that code. I implemented something that seemed to work, quite sure that I still did not have all the details correct in my implementation, and this suspicion would prove correct later on.

Local and Non-Local Capabilities

Another thing that I had not fully understood, when trying to put together a library handling IPC that I could tolerate working with, was the way that capabilities may be transferred in IPC messages within tasks. Capabilities are references to components in the system, and when transferred between tasks, the receiving task is meant to allocate a “slot” for each received capability. By choosing a slot denoted by an index, the task (or the program running in it) can tell the kernel where to record the capability in its own registry for the task, and by employing this index in its own registry, the program will be able to maintain a record of available capabilities consistent with that of the kernel.

The practice of allocating capability slots for received capabilities is necessary for transfers between tasks, but when the transfer occurs within a task, there is no need to allocate a new slot: the received capability is already recorded within the task, and so the item describing the capability in the message will actually encode the capability index known to the task. Previously, I was not generally sending capabilities in messages within tasks, and so I had not knowingly encountered any issues with my simplistic “general case” support for capability transfers, but having implemented a region mapper that resides in the same task as a program being run, it became necessary to handle the capabilities presented to the region mapper from within the same task.

One counterintuitive consequence of the capability management scheme arises from the general, inter-task transfer case. When a task receives a capability from another task, it will assign a new index to the capability ahead of time, since the kernel needs to process this transfer as it propagates the message. This leaves the task with a new capability without any apparent notion of whether it has seen that capability before. Maybe there is a way of asking the kernel if two capabilities refer to the same object, but it might be worthwhile just not relying on such facilities and designing frameworks around such restrictions instead.

Starting and Stopping

So, back to the exercise of stopping programs that I had been able to start! It turned out that receiving the notification that a program had finished was only the start; what then needed to happen was something of a mystery. Intuitively, I knew that the task hosting the program’s threads would need to be discarded, but I envisaged that the threads themselves probably needed to be discarded first, since they are assigned to the task and probably cannot have that task removed from under them, even if they are suspended in some sense.

But what about everything else referenced by the task? After all, the task will have capabilities for things like dataspaces that provide access to regions of files and to the program stack, for things like the filesystem for opening other files, for semaphore and IRQ objects, and so on. I cannot honestly say that I have the definitive solution, and I could not easily find much in the way of existing guidance, so I decided in the end to just try and tidy all the resources up as best I could, hopefully doing enough to make it possible to release the task and have the kernel dispose of it. This entailed a fairly long endeavour that also encouraged me to evolve the way that the monitoring of the process termination is performed.

When the started program eventually reaches the end and sends a message to its “parent” component, that component needs to record any termination state communicated in the message so that it may be passed on to the program’s creator or initiator, and then it also needs to commence the work of wrapping up the program. Here, I decided on a distinct component separate from one responsible for any paging activities to act as the contact point for the creating or initiating program. When receiving a termination message or signal, this component disconnects the terminating program from its internal pager by freeing the capability, and this then causes the internal pager to terminate, itself sending a signal to its own parent.

One important aspect of starting and terminating processes is that of notifying the party that sought to start a process in the first place. For filesystem operations, I had already implemented support for certain notification events related to opening, modifying and closing files and pipes, with these being particularly important for pipes. I wanted to extend this support to processes so that it might be possible to monitor files, pipes and processes together using a kind of select or poll operation. This led to a substantial detour where I became dissatisfied with the existing support, modified it, had to debug it, and remain somewhat concerned that it might need more work in the future.

Testing on the different architectures under QEMU also revealed that I would need to handle the possibility that a program might be started and run to completion before its initiator had even received a reference to the program for notification purposes. Fortunately, a similar kind of vanishing resource problem arose when I was developing the file paging system, and so I had a technique available to communicate the reference to the process monitor component to the initiator of the program, ensuring that the process monitor becomes established in the kernel’s own records, before the program itself gets started, runs and completes, avoiding the process monitor being tidied up before its existence becomes known to the wider system.

Wrapping Up Again

A few concerns remain with the state of the work so far. I experienced problems with filesystem access that I traced to the activity of repeatedly attaching and detaching dataspaces, which is something my filesystem access library does deliberately, but the error suggested that the L4Re region mapper had somehow failed to attach the appropriate region. This may well be caused by issues within my own code, and my initial investigation did indeed uncover a problem in my own code where the size of the attached region of a file would gradually increase over time. With this mistake fixed, the situation was improved, but the underlying problem was not completely eliminated, judging from occasional errors. A workaround has been adopted for now.

Various other problems arose and were hopefully resolved. I would say that some of them were due to oversights when getting things done takes precedence over a more complete consideration of all the issues, particularly when working in a language like C++ where lower-level chores like manual memory management enter the picture. The differing performance when emulating various architectures under QEMU also revealed a deficiency with my region mapper implementation. It turned out that detach operations were not returning successfully, leading the L4Re library function to return without invalidating memory pages, and so my file access operations were returning pages of incorrect content instead of the expected file content for the first few accesses until the correct pages had been paged in and were almost continuously resident.

Here, yet more digging around in the L4Re code revealed an apparent misunderstanding about the return value associated with one of the parameters to the detach operation, that of the detached dataspace. I had concluded that a genuine capability was meant to be returned, but it seems that a simple index value is returned in a message word instead of a message item, and so there is no actual capability transferred to the caller, not even a local one. The L4Re IPC framework does not really make the typing semantics very clear, or at least not to me, and the code involved is quite unfathomable. Again, a formal interface specification written in a clearly expressed language would have helped substantially.

Next Steps

I suppose progress of sorts has been made in the last month or so, for which I can be thankful. Although tidying up the detritus of my efforts will remain an ongoing task, I can now initiate programs and wait for them to finish, meaning that I can start building up test suites within the environment, combining programs with differing functionality in a Unix-like fashion to hopefully validate the behaviour of the underlying frameworks and mechanisms.

Now, I might have tried much of this with L4Re’s Lua-based scripting, but it is not as straightforward as a more familiar shell environment, appearing rather more low-level in some ways, and it is employed in a way that seems to favour parallel execution instead of the sequential execution that I might desire when composing tests: I want tests to involve programs whose results feed into subsequent programs, as opposed to just running a load of programs at once. Also, without more extensive documentation, the Lua-based scripting support remains a less attractive choice than just building something where I get to control the semantics. Besides, I also need to introduce things like interprocess pipes, standard input and output, and such things familiar from traditional software platforms. Doing that for a simple shell-like environment would be generally beneficial, anyway.

Should I continue to make progress, I would like to explore some of the possibilities hinted at above. The modular architecture of a microkernel-based system should allow a more flexible approach in partitioning the activities of different users, along with the configuration of their programs. These days, so much effort is spent in “orchestration” and the management of containers, with a veritable telephone directory of different technologies and solutions competing for the time and attention of developers who are now also obliged to do the work of deployment specialists and systems administrators. Arguably, much of that involves working around the constraints of traditional systems instead of adapting to those systems, with those systems themselves slowly adapting in not entirely convincing or satisfactory ways.

I also think back to my bachelor’s degree dissertation about mobile software agents where the idea was that untrusted code might be transmitted between systems to carry out work in a safe and harmless fashion. Reducing the execution environment of such agent programs to a minimum and providing decent support for monitoring and interacting with them would be something that might be more approachable using the techniques explored in this endeavour. Pervasive, high-speed, inexpensively-accessed networks undermined the envisaged use-cases for mobile agents in general, although the practice of issuing SQL queries to database servers or having your browser run JavaScript programs deployed in Web pages demonstrates that the general paradigm is far from obsolete.

In any case, my “to do” list for this project will undoubtedly remain worryingly long for the foreseeable future, but I will hopefully be able to remedy shortcomings, expand the scope and ambition of the effort, and continue to communicate my progress. Thank you to those who have made it to the end of this rather dry article!

The KDE Qt5 Patch Collection has been rebased on top of Qt 5.15.9



Commercial release announcement:

OpenSource release announcement:


As usual I want to personally extend my gratitude to the Commercial users of Qt for beta testing Qt 5.15.9 for the rest of us.


The Commercial Qt 5.15.9 release introduced one bug that have later been fixed. Thanks to that, our Patchset Collection has been able to incorporate the fix for the issue [1] and the Free Software users will never be affected by it! 


P.S: Special shout-out to Andreas Sturmlechner for identifying the fix of the issue, since I usually only pay attention to "Revert XYZ" commits and this one is not a revert but subsequent improvement

Saturday, 01 April 2023

RAII: Tragedy in three acts

In a recent Computerphile video, Ian Knight talked about RAII idiom and it’s application in C++ and Rust. While the video described the general concepts, I felt different examples could be more clearly convey essence of the topic.

I’ve decided to give my own explanation to hopefully better illustrate what RAII is and how it relates to Rust’s ownership. Then again, for whatever reason I’ve decided to write it as a play with dialogue in faux Old English so it may very well be even more confusing instead.

Cast of characters

(In the order of appearance)
GregoryA software engineer and Putuel’s employee number #1
SampsonA software engineer and a self-proclaimed 10× developer
ParisAn apprentice returning to Putuel two summers in a row
CTOPuteal’s Chief Technical Officer spending most of his time in meetings
AdminAdministrative assistant working in Puteal Corporation’s headquarters in Novear

Act I

Scene I

Novear. A public place.
Enter Sampson and Gregory, two senior engineers of the Puteal Corporation, carrying laptops and phones
GregoryPray tell, what doth the function’s purpose?
SampsonIt doth readeth a number from a file. A task as trivial as can be and yet QA reports memory leak after my change. Hence, I come to thee for help.
Both look at a laptop showing code Sampson has written [error handling omitted for brevity from all source code listings]:
double read_double(FILE *fd) {
	char *buffer = malloc(1024);       /* allocate temporary buffer */
	fgets(buffer, 1024, fd);           /* read first line of the file */
	return atof(buffer);               /* parse and return the number */
GregoryThine mistake is apparent. Thou didst allocate memory but ne’er freed it. Verily, in C thou needs’t to explicitly free any memory thou dost allocate. Submit this fix and thy code shall surely pass.
double read_double(FILE *fd) {
	char *buffer = malloc(1024);       /* allocate temporary buffer */
	fgets(buffer, 1024, fd);           /* read first line of the file */
	double result = atoi(buffer);      /* parse the line */
	free(buffer);                      /* free the temporary buffer */
	return result;                     /* return parsed number */

Scene II

A hall.
Enter Puteal CTO, an apprentice called Paris and an Admin
ParisI’ve done as Sampson beseeched of me. I’ve taken the read_double function and changed it so that it doth taketh file path as an argument. He hath warned me about managing memory and so I’ve made sure all temporary buffers are freed. Nonetheless, tests fail.
double read_double(const char *path) {
	FILE *fd = fopen(path, "r");       /* open file */
	char *buffer = malloc(1024);
	fgets(buffer, 1024, fd);
	double result = atof(buffer);
	return result;
CTOThou didst well managing memory, but memory isn’t the only resource that needs to be freed. Just like allocations, if thou dost open a file, thou must close it anon once thou art done with it.
Exit CTO and Admin towards sounds of a starting meeting
ParisManaging resources is no easy task but I think I’m starting to get the hang of it.
double read_double(const char *path) {
	FILE *fd = fopen(path, "r");       /* open file */
	char *buffer = malloc(1024);
	fgets(buffer, 1024, fd);
	fclose(fd);                        /* close the file */
	double result = atof(buffer);
	return result;

Scene III

Novear. A room in Puteal’s office.
Enter Paris and Sampson they set them down on two low stools, and debug
ParisThe end of my apprenticeship is upon me and yet my code barely doth work. It canst update the sum once but as soon as I try doing it for the second time, nothing happens.
double update_sum_from_file(mtx_t *lock,
                            double *sum,
                            const char *path)  {
	double value = read_double(path);  /* read next number from file */
	mtx_lock(lock);                    /* reserve access to `sum` */
	value += sum->value;               /* calculate sum */
	sum->value = value;                /* update the sum */
	return value;                      /* return new sum */
SampsonThou hast learned well that resources need to be acquired and released. But what thou art missing is that not only system memory or a file descriptor are resources.
ParisSo just like memory needs to be freed, files need to be closed and locks needs to be unlocked!
double update_sum_from_file(mtx_t *lock,
                            double *sum,
                            const char *path)  {
	double value = read_double(path);  /* read next number from file */
	mtx_lock(lock);                    /* reserve access to `sum` */
	value += *sum;                     /* calculate sum */
	*sum = value;                      /* update the sum */
	mtx_unlock(lock);                  /* release `sum` */
	return value;                      /* return new sum */
ParisI’m gladdened I partook the apprenticeship. Verily, I’ve learned that resources need to be freed once they art no longer used. But also that many things can be modelled like a resource.
I just don’t comprehend why it all needs to be done manually?
Exit Sampson while Paris monologues leaving him puzzled

Act II

Scene I

Court of Puteal headquarters.
Enter Sampson and Paris bearing a laptop before him
ParisMine last year’s apprenticeship project looks naught like mine own handiwork.
SampsonThou seest, in the year we migrated our code base to C++.
ParisAye, I understandeth. But I spent so much time learning about managing resources and yet the new code doth not close its file.
Enter Gregory and an Admin with a laptop. They all look at code on Paris’ computer:
double read_double(const char *path) {
	std::fstream file{path};           /* open file */
	double result;                     /* declare variable to hold result */
	file >> result;                    /* read the number */
	return result;                     /* return the result */
SampsonOh, that’s just RAII.  Resource Acquisition Is Initialisation idiom. C++ usetht it commonly.
GregoryResource is acquired when object is initialised and released when it’s destroyed. The compiler tracks lifetimes of local variables and thusly handles resources for us.
By this method, all manner of resources can be managed. And forsooth, for more abstract concepts without a concrete object representing them, such as the concept of exclusive access to a variable, a guard class can be fashioned. Gaze upon this other function:
double update_sum_from_file(std::mutex &lock,
                            double *sum,
                            const char *path)  {
	double value = read_double(path);  /* read next number from file */
	std::lock_guard<std::mutex> lock{mutex}; /* reserve access to `sum` */
	value += *sum;                     /* calculate sum */
	*sum = value;                      /* update the sum */
	return value;                      /* return new sum */
ParisI perceive it well. When the lock goes out of scope, the compiler shall run its destructor, which shall release the mutex. Such was my inquiry yesteryear. Thus, compilers can render managing resources more automatic.

Scene II

Novear. Sampson’s office.
Enter Gregory and Sampson
SampsonVerily, this bug doth drive me mad! To make use of the RAII idiom, I’ve writ an nptr template to automatically manage memory.
template<class T>
struct nptr {
	nptr(T *ptr) : ptr(ptr) {}         /* take ownership of the memory */
	~nptr() { delete ptr; }            /* free memory when destructed */
	T *operator->() { return ptr; }
	T &operator*() { return *ptr; }
	T *ptr;
GregoryI perceive…    And what of the code that bears the bug?
Sampson'Tis naught but a simple code which calculates sum of numbers in a file:
std::optional<double> try_read_double(nptr<std::istream> file) {
	double result;
	return *file >> result ? std::optional{result} : std::nullopt;

double sum_doubles(const char *path) {
	nptr<std::istream> file{new std::fstream{path}};
	std::optional<double> number;
	double result = 0.0;
	while ((number = try_read_double(file))) {
		result += *number;
	return result;
Enter Paris with an inquiry for Sampson; seeing them talk he pauses and listens in
GregoryThe bug lies in improper ownership tracking. When ye call the try_read_double function, a copy of thy nptr is made pointing to the file stream. When that function doth finish, it frees that very stream, for it believes that it doth own it. Alas, then you try to use it again in next loop iteration.
Why hast thou not made use of std::unique_ptr?
SampsonAh! I prefer my own class, good sir.
GregoryThine predicament would have been easier to discern if thou hadst used standard classes. In truth, if thou wert to switch to the usage of std::unique_ptr, the compiler would verily find the issue and correctly refuse to compile the code.
std::optional<double> try_read_double(std::unique_ptr<std::istream> file) {
	double result;
	return *file >> result ? std::optional{result} : std::nullopt;

double sum_doubles(const char *path) {
	auto file = std::make_unique<std::fstream>(path);
	std::optional<double> number;
	double result = 0.0;
	while ((number = try_read_double(file))) {  /* compile error */
		result += *number;
	return result;
Exit Gregory, exit Paris moment later

Scene III

Before Sampson’s office.
Enter Gregory and Paris, meeting
ParisI’m yet again vexed. I had imagined that with RAII, the compiler would handle all resource management for us?
GregoryVerily, for RAII to function, each resource must be owned by a solitary object. If the ownership may be duplicated then problems shall arise. Ownership may only be moved.
ParisCouldn’t compiler enforce that just like it can automatically manage resources?
GregoryMayhap the compiler can enforce it, but it’s not a trivial matter. Alas, if thou art willing to spend time to model ownership in a way that the compiler understands, it can prevent some of the issues. However, thou wilt still require an escape hatch, for in the general case, the compiler cannot prove the correctness of the code.
Exit Gregory and Paris, still talking


Scene I

A field near Novear.
Enter Gregory and Paris
GregoryGreetings, good fellow! How hast thou been since thy apprenticeship?
ParisI’ve done as thou hast instructed and looked into Rust. It is as thou hast said. I’ve recreated Sampson’s code and the compiler wouldn’t let me run it:
fn try_read_double(rd: Box<dyn std::io::Read>) -> Option<f64> {

fn sum_doubles(path: &std::path::Path) -> f64 {
	let file = std::fs::File::open(path).unwrap();
	let file: Box<dyn std::io::Read> = Box::new(file);
	let mut result = 0.0;
	while let Some(number) = try_read_double(file) {
		result += number;
GregoryVerily, the compiler hath the vision to behold the migration of file’s ownership into the realm of try_read_double function during the first iteration and lo, it is not obtainable any longer by sum_doubles.
error[E0382]: use of moved value: `file`

    let file: Box<dyn std::io::Read> = Box::new(file);
        ---- move occurs because `file` has type `Box<dyn std::io::Read>`,
             which does not implement the `Copy` trait
    let mut result = 0.0;
    while let Some(number) = try_read_double(file) {
                                             ^^^^ value moved here, in previous
                                                  iteration of loop
ParisAlas, I see not what thou hast forewarned me of. The syntax present doth not exceed that which wouldst be used had this been writ in C++:
fn try_read_double(rd: &dyn std::io::Read) -> Option<f64> {

fn sum_doubles(path: &std::path::Path) -> f64 {
	let file = std::fs::File::open(path).unwrap();
	let file: Box<dyn std::io::Read> = Box::new(file);
	let mut result = 0.0;
	while let Some(number) = try_read_double(&*file) {
		result += number;
GregoryVerily, the Rust compiler is of great wit and often elides lifetimes. Nonetheless, other cases may prove more intricate.
struct Folder<T, F>(T, F);

impl<T, F: for <'a, 'b> Fn(&'a mut T, &'b T)> Folder<T, F> {
    fn push(&mut self, element: &T) {
        (self.1)(&mut self.0, element)
ParisSurely though, albeit this code is more wordy, it is advantageous if I cannot commit an error in ownership.
GregoryVerily, there be manifold factors in the selection of a programming tongue. And there may be aspects which may render other choices not imprudent.


A thing to keep in mind is that the examples are somewhat contrived. For example, the buffer and file object present in read_double function can easily live on the stack. A real-life code wouldn’t bother allocating them on the heap. Then again, I could see a beginner make a mistake of trying to bypass std::unique_ptr not having a copy constructor by creating objects on heap and passing pointers around.

In the end, is this better explanation than the one in aforementioned Computerphile video? I’d argue code examples represent discussed concepts better though to be honest form of presentation hinders clarity of explanation. Yet, I had too much fun messing around with this post so here it is in this form.

Lastly, I don’t know Old English so the dialogue is probably incorrect. I’m happy to accept corrections but otherwise I don’t care that much. One shouldn’t take this post too seriously.

Thursday, 30 March 2023

Configuring algorithms in Modern C++

When designing library code, one often wonders: “Are these all the parameters this function will ever need?” and “How can a user conveniently change one parameter without specifying the rest?” This post introduces some Modern C++ techniques you can use to make passing configuration options easy for your users while allowing you to add more options later on.


Most people who have programmed in C++ before should have no problems understanding this article, although you will likely appreciate it more, if you are a library developer or have worried about the forward-compatibility of your code.

Some of the features introduced in this post do not yet work with Clang. Most should work with MSVC, but I only double-checked the code with GCC12 (any version >= 10 should work).


Let’s say you are writing an algorithm with the following signature:

auto algo(auto data, size_t threads = 4ull);

It takes some kind of data input, does lots of magic computation on it and returns some other data. An actual algorithm should of course clearly state what kind of input data it expects (specific type or constrained template parameter), but we want to focus on the other parameters in this post. The only configuration option that you want to expose is the number of threads it shall use. It defaults to 4, because you know that the algorithm scales well at four threads and also you assume that most of your users have at least four cores on their system.

Now, a bit later, you have added an optimisation to the algorithm called heuristic42. It improves the results in almost all cases, but there are a few corner cases where users might want to switch it off. The interface now looks like this:

auto algo(auto data, size_t threads = 4ull, bool heuristic42 = true);

This is not too bad, you might think, but there are already two ugly things about this:

  1. To overwrite the second “config option”, the user needs to also specify the first, i.e. algo("data", 4, false);. This means that they need to look up (and enter correctly) the first config option’s default value. Also, if you change that default in a future release of the code, this change will not be reflected in the user’s invocation who unknowingly enforces the old default.
  2. Since passing arguments to the function does not involve the parameter’s name, it is very easy to confuse the order of the parameters. Implicit conversions make this problem even worse, so invoking the above interface with algo("data", false, 4); instead of algo("data", 4, false); generates no warning, even with -Wall -Wextra -pedantic!

Wow, what a usability nightmare, and we only have two config options! Imagine adding a few more…

Dedicated config object

As previously mentioned, the parameter name cannot be used when passing arguments. However, C++20 did add designated initialisers for certain class types. So you can use the name of a member variable when initialising an object. We can use that!

struct algo_config
    bool heuristic42 = true;
    size_t threads   = 4ull;

auto algo(auto data, algo_config const & cfg)
    /* implementation */

int main()
    /* create the config object beforehand (e.g. after argument parsing) */
    algo_config cfg{.heuristic42 = false, .threads = 8};
    algo("data", cfg);

    /* create the config object ad-hoc */
    algo("data", algo_config{.heuristic42 = false, .threads = 8});   // set both paramaters
    algo("data", algo_config{.threads = 8});                         // set only one parameter

    /* providing the config type's name is optional */
    algo("data", {.threads = 8});

Compile and edit online (via godbolt)

As you can see, this solves both of the problems mentioned previously! We refer to the config elements by name to avoid mixups, and we can choose to overwrite only those parameters that we actually want to change; other configuration elements will be whatever they are set to by default. Conveniently, this allows changing the default later on, and all invocations that don’t overwrite it will pick up the new default.

Another great feature is that the API maintainer of the algorithm can easily add more members to the configuration object without invalidating any existing invocations. This allows users of the API to gradually adopt new opt-in features.

As the name of the config type can even be omitted (see last invocation), the syntactic overhead for the “ad-hoc” initialisation is very low, almost like providing the arguments directly to the function-call.

There is an important catch: The order in which the designated initialisers are given has to correspond to the order in which the respective members are defined in the type of the config. It is okay to omit initialisers at the beginning, middle or end (as long as defaults are given), but the relative order of all the initialisers that you do provide has to be correct. This might sound like a nuisance, but in contrast to the problem discussed initially (mixed up order of function arguments), you will actually get a compiler-error that tells you that you got the order wrong; so the problem is easily detected and fixed. And there is a nice rule that you can follow for such config objects: always sort the members alphabetically! That way users intuitively know the order and don’t have to look it up 💡

Types as config elements

Now, sometimes you want to pass a type as kind of parameter to an algorithm. Imagine that the algorithm internally juggles a lot of integers. Maybe it even does SIMD with them. In those cases, the size of the integers could affect performance noticeably.

Some algorithms might be able to infer from the data input’s type which integers to use for computation, but in other cases you want the user to be able to override this. Thus we need the ability to pass the desired type to the algorithm. The canonical way of doing this is via template arguments:

template <typename int_t>
auto algo(auto data, size_t threads = 4ull);

But this is has the same problems that we discussed initially: as soon as multiple types are passed, it is possible confuse the order (and not be notified); to set a later parameters, you need to also set previous ones; et cetera. There might also be weird interactions with the type of the data parameter, in case that is a template parameter.

Let’s add the “type parameter” to our config object instead:

/* We define a "type tag" so we can pass types as values */
template <typename T>
inline constinit std::type_identity<T> ttag{};

/* The config now becomes a template */
template <typename Tint_type = decltype(ttag<uint64_t>)>
struct algo_config
    bool heuristic42    = true;
    Tint_type int_type  = ttag<uint64_t>;
    size_t threads      = 4ull;

/* And also the algorithm */
template <typename ...Ts>
auto algo(auto data, algo_config<Ts...> const & cfg)
    /* implementation */

int main()
    /* Setting just "value parameters" still works with and without "algo_config" */
    algo("data", algo_config{.heuristic42 = false, .threads = 8});
    algo("data",            {.heuristic42 = false, .threads = 8});

    /* When setting a "type parameter", we need to add "algo_config" */
    algo("data", algo_config{.int_type = ttag<uint32_t>, .threads = 8});
Compile and edit online (via godbolt)

There are a few things happening here. In the beginning, we use variable templates to define an object that “stores” a type. This can later be used to initialise members of our config object.

Next, we need to make algo_config a template. Unfortunately, we need to default the template parameter as well as giving the member a default value. Finally, algo() needs template parameters for the config, as well. It is handy to just use a parameter pack here, because it means we don’t need to change it if we add more template parameters the config type. This is all a bit more verbose than before, after all we are still writing C++ 😅 But most of this will be hidden from the user anyway.

The invocation of the algorithm is almost unchanged from before, we just use ttag<uint32_t> to initialise the “type parameter” of the config. There is one caveat: when passing such “type parameters”, it is now necessary to add algo_config, although, fortunately, you do not need to spell out the template arguments. In general, this may be a bit surprising, so I recommend always including the config-name in examples to teach your users a single syntax.

Constants as config elements

Using a similar technique to the one above, we can also pass compile-time constants to the config object. This allows the algorithm to conveniently use if constexpr to choose between different codepaths, e.g. between a SIMD-based codepath and a regular one.

/* We define a "value tag" type so we can pass values as types...*/
template <auto v>
struct vtag_t
    static constexpr auto value = v;

/* ...and then we define a variable template to pass the type as value again! */
template <auto v>
inline constinit vtag_t<v> vtag{};

/* The config is a template */
template <typename Tuse_simd = vtag_t<false>>
struct algo_config
    bool heuristic42    = true;
    size_t threads      = 4ull;
    Tuse_simd use_simd  = vtag<false>;

/* The algorithm */
template <typename ...Ts>
auto algo(auto data, algo_config<Ts...> const & cfg)
    /* implementation */

int main()
    /* Setting just "value parameters" still works with and without "algo_config" */
    algo("data", algo_config{.heuristic42 = false, .threads = 8});
    algo("data",            {.heuristic42 = false, .threads = 8});

    /* When setting a "constant parameter", we need to add "algo_config" */
    algo("data", algo_config{.threads = 8, .use_simd = vtag<true>});
Compile and edit online (via godbolt)

As you can see, this is very similar to the previous example. The only difference is, that we need another initial step to encode the value as a type. It is even possible to have parameters that are (run-time) values by default, but can be configured as (compile-time) constants in the way shown above. And, of course, all kinds of config options can be combined.

Note that the definitions of the “tagging” features would happen in your utility code. Users only need to know that they can pass constants via vtag<42> and types via ttag<int32_t>.

Post scriptum

I hope this post was helpful to some of you. I think this is a big step forward for usability, and I hope Clang catches up with the required features as soon as possible!

There are two things here that could be improved:

  1. If a template parameter can be deduced from member initialisers, it should be. This would allow us to omit the default template arguments for algo_config, i.e. = decltype(ttag<uint64_t>) and = vtag_t<false>.
  2. When a brace-enclosed initialiser list is passed to a function template to initialise a parameter of deduced type, consider the contents of that initialiser list. This would allow us to omit align_config also when passing “type parameters” or constants.

I have the feeling that 1. might not be too difficult and also not too controversial. But I suspect that 2. would be more complicated as it interacts with function overloading and I can imagine situations were this change would break existing code.

But I’d love to here other people’s opinion on the matter!


The ISO WG21 papers that added these features to C++:

Monday, 20 March 2023

Flow-Based Programming, a way for AI and humans to develop together

I think by now everybody reading this will have seen how the new generation of Large Language Models like ChatGPT are able to produce somewhat useful code. Like any advance in software development—from IDEs to high-level languages—this has generated some discussion on the future employment prospects in our field.

This made me think about how these new tools could fit the world of Flow-Based Programming, a software development technique I’ve been involved with for quite a while. In Flow-Based Programming these is a very strict boundary between reusable “library code” (called Components) and the “application logic” (called the Graph).

Here’s what the late J. Paul Morrison wrote on the subject in his seminal work, Flow-Based Programming: A New Approach to Application Development (2010):

Just as in the preparation and consumption of food there are the two roles of cook and diner, in FBP application development there are two distinct roles: the component builder and the component user or application designer.

…The application designer builds applications using already existing components, or where satisfactory ones do not exist s/he will specify a new component, and then see about getting it built.

Remembering that passage made me wonder, could I get one of the LLMs to produce useful NoFlo components? Armed with New Bing, I set out to explore.

AI and humans working together

The first attempt was specifying a pretty simple component:

New Bing writing a component

That actually looks quite reasonable! I also tried asking New Bing to make the component less verbose, as well as generating TypeScript and CoffeeScript variants of the same. All seemed to produce workable things! Sure, there might be some tidying to do, but this could remove a lot of the tedium of component creation.

In addition to this trivial math component I was able to generate some that to call external REST APIs etc. Bing was even able to switch between HTTP libraries as requested.

What was even cooler was that it actually suggested to ask it how to test the component. Doing as I was told, the result was quite astonishing:

New Bing writing fbp-spec tests

That’s fbp-spec! The declarative testing tool we came up with! Definitely the nicest way to test NoFlo (or any other FBP framework) components.

Based on my results, you’ll definitely want to check the generated components and tests before running them. But what you get out is not bad at all.

I of course also tried to get Bing to produce NoFlo graphs for me. This is where it stumbled quite a bit. Interestingly the results were better in the fbp language than in the JSON graph format. But maybe that even more enforces that the sweet spot would be AI writing components and a human creating the graphs that run those.

AI and humans working together

As I’m not working at the moment, I don’t have a current use case for this way of collaborating. But I believe this could be a huge productivity booster for any (and especially Flow-Based) application development, and expect to try it in whatever my next gig ends up being.

Illustrations: MidJourney, from prompt Robot software developer working with a software architect. Floating flowcharts in the background

Sunday, 19 March 2023

Monospace considered harmful

No, I haven’t gone completely mad yet and still, I write this as an appeal to stop using monospaced fonts for code (conditions may apply). While fixed-width fonts have undeniable benefits when authoring software, their use is excessive and even detrimental in certain contexts. Specifically, when displaying inline code within a paragraph of text, proportional fonts are a better choice.

The downsides

4′30″5′5′30″TahomaTimes New RomanVerdanaArialComic SansGeorgiaCourier New
Fig. 1. Comparison of time needed to read text set with different fonts. Reading fixed-width Courier New is 13% slower than reading Tahoma.

Fixed-width fonts for inline code have a handful of downsides. Firstly, text set in such font takes up more space and, depending on the font pairing, individual letters may appear larger. This creates unbalanced look and opportunities for awkward line wrapping.

Moreover, a fixed-width typeface has been shown to be slower to read. Even disregarding the speed differences, switching between two drastically different types of font isn’t comfortable.

To make matters worse, many websites apply too many styles to inline code fragments. For example GitHub and GitLab (i) change the font, (ii) decrease its size, (iii) add background and (iv) add padding. This overemphasis detracts from the content rather than enhancing it.

A better way

A better approach is using serif (sans-serif) font for the main text and a sans-serif (serif) font for inline code. Or if serif’s aren’t one’s cup of tea, even within the same font group a pairing allowing for clear differentiation between the main text and the code is possible. For example a humanist font paired with a complementary geometric font.

Another option is to format code with a different colour. To avoid using it as the only mean of conveying information, a subtle colour change may be used in conjunction with font change. This is the approach I’ve taken on this blog.

It’s also worth considering whether inline code even needs any kind of style change. For example, the sentence ‘Execution of a C program starts from the main function’ is perfectly understandable whether or not ‘main’ is styled differently.


What about code blocks? Using proportional typefaces for them can be done with some care. Indentation isn’t a major problem but some alignment may need adjustments. Depending on the type of code listings, it may be an option. Having said that, I don’t claim this as the only correct option for web publishing.

As an aside, what’s the deal with parenthesise after a function name? To demonstrate, lets reuse an earlier example: ‘Execution of a C program starts from the main() function’. The brackets aren’t part of the function name and unless they are used to disambiguate between multiple overloaded functions, there’s really no need for them.

To conclude, while fixed-width fonts have their place when writing code, their use in displaying inline code is often unnecessary. Using a complementary pairing of proportional typefaces is a better options that can enhance readability. Changing background of inline code is virtually never a good idea.

Using serif faces on websites used to carry risk of aliasing reducing legibility. Thankfully, the rise of high DPI displays largely alleviated those concerns.

Combining colour change and typeface change breaks principle of using small style changes. Nonetheless, I believe some leniency for websites is in order. It’s not always guaranteed that readers will see fonts author has chosen making colour change kind of a backup. Furthermore, compared to books, change in colour isn’t as focus-grabbing on the Internet.

Friday, 10 March 2023

KDE Gear 23.04 branches created

Make sure you commit anything you want to end up in the KDE Gear 23.04 releases to them

We're already past the dependency freeze.

The Feature Freeze and Beta is next week Thursday 16 of March.

More interesting dates  
  March 30: 23.04 RC (23.03.90) Tagging and Release
  April 13: 23.04 Tagging
  April 20: 23.04 Release

Wednesday, 08 March 2023

Send you talks for Akademy 2023 *now*!

Call for proposal ends Thursday the 30th of March

There's still a few weeks, but time is really running out.


I'm sure there's lots of interesting things you have to talk about Qt, KDE, C++, Community Management or other million things so head over to or over to if you want to skip the nicely worded page that encourages you to submit a talk :)

Monday, 06 March 2023

Keeping a semi-automatic electronic ship's logbook

Maintaining a proper ship’s logbook is something that most boats should do, for practical, as well as legal and traditional reasons. The logbook can serve as a record of proper maintenance and operation of the vessel, which is potentially useful when selling the boat or handling an insurance claim. It can be a fun record of journeys made to look back to. And it can be a crucial aid for getting home if the ship’s electronics or GNSS get disrupted.

Like probably most operators of a small boat, on Lille Ø our logbook practices have been quite varying. We’ve been good at recording engine maintenance, as well as keeping the traditional navigation log while offshore. But in the more hectic pace of coastal cruising or daysailing this has often fallen on the wayside. And as such, a lot of the events and history of the boat is unavailable.

To redeem this I’ve developed signalk-logbook, a semi-automatic electronic logbook for vessels running the Signal K marine data server.

This allows logbook entries to be produced both manually and automatically. The can be viewed and edited using any web-capable device on board, meaning that you can write a log entry on your phone, and maybe later analyse and print them on your laptop.

Why Signal K

Signal K is a marine data server that has integrations with almost any relevant marine electronics system. If you have an older NMEA0183 or Seatalk system, Signal K can communicate with it. Same with NMEA2000. If you already have your navigational data on the boat WiFi, Signal K can use and enrich it.

This means that by making the logbook a Signal K plugin, I didn’t have to do any work to make it work with existing boat systems. Signal K even provides a user interface framework.

This means that to make the electronic logbook happen, I only had to produce some plugin JavaScript, and then build a user interface. As I don’t do front-end development that frequently, this gave me a chance to dive into modern React with hooks for the first time. What better to do after being laid off?

Signal K also has very good integration with Influx and Grafana. These can record vessel telemetry in a high resolution. So why bother with a logbook on the side? In my view, a separate logbook is still valuable for storing the comments and observations not available in a marine sensor network. It can also be a lot more durable and archivable than a time series database. On Lille Ø we run both.

User interface

The signalk-logbook comes with a reasonably simple web-based user interface that is integrated in the Signal K administration UI. You can find it in Web appsLogbook.

The primary view is a timeline. Sort of “Twitter for your boat” kind of view that allows quick browsing of entries on both desktop and mobile.

Logbook timeline view

There is also the more traditional tabular view, best utilized on bigger screens:

Logbook timeline view

While the system can produce a lot of the entries automatically, it is also easy to create manual entries:

Adding an entry

These entries can also include weather observations. Those using celestial navigation can also record manual fixes with these entries! Entries can be categorized to separate things like navigational entries from radio or maintenance logs.

If you have the sailsconfiguration plugin installed, you can also log sail changes in a machine-readable format:

Sail changes editor

Since the log format is machine readable, the map view allows browsing entries spatially:

Log entries on a map

Electronic vs. paper

The big benefits of an electronic logbook are automation and availability. The logbook can create entries by itself based on what’s happening with the vessel telemetry. You can read and create log entries anywhere on the boat, using the electronic devices you carry with you. Off-vessel backups are also both possible, and quite easy, assuming that the vessel has a reasonably constant Internet connection.

With paper logbooks, the main benefit is that they’re fully independent of the vessel’s electronic system. In case of power failure, you can still see the last recoded position, heading, etc. They are also a lot more durable in the sense that paper logbooks from centuries ago are still fully readable. Though obviously that carries a strong survivorship bias. I would guess the vast majority of logbooks, especially on smaller non-commercial vessels, don’t survive more than a couple of years.

So, how to benefit from the positive aspects of electronic logbooks, while reducing the negatives when compared to paper? Here are some ideas:

  • Mark your position on a paper chart. Even though most boats navigate with only electronic charts, it is a good idea to have at least a planning chart available on paper. When offshore, plot your hourly or daily position on it. This will produce the navigation aid of last resort if all electronics fail. And marked charts are pretty!
  • Have an off-vessel backup of your electronic logs. The signalk-logbook uses a very simple plain text format for its entries exactly for this reason. The logs are easy to back up, and can also be utilized without the software itself. This means that with a bit of care your log entries shouls stay readable for many many years to come. On Lille Ø we store them on GitHub
  • Print your logs. While this is not something I’m planning to do personally, it would be possible to print your log entries periodically, maybe daily or after each trip. Then you can have an archival copy that doesn’t rely on electronics


In addition to providing a web-based user interface, signalk-logbook provides a REST API. This allows software developers to create new integrations with the logbook. For example, these could include:

  • Automations to generate log entries for some events via node-red or NoFlo
  • Copying the log entries to a cloud service
  • Exporting the logs to another format, like GPX or a spreadsheet
  • Other, maybe non-web-based user interfaces for browsing and creating log entries

Getting started

To utilize this electronic logbook, you need a working installation of Signal K on your boat. The common way to do this is by having a Raspberry Pi powered by the boat’s electrical system and connected to the various on-board instruments.

There are some nice solutions for this:

  • Sailor Hat for Raspberry Pi allows powering a Raspberry Pi from the boat’s 12V system. It also handles shutdowns in a clean way, protecting the memory card from data corruption
  • Pican-M both connects a Raspberry Pi to a NMEA2000 bus, and powers it through that

You can of course also do a more custom setup, like we did on our old boat, Curiosity.

For the actual software setup, marinepi-provisioning gives a nice Ansible playbook for getting everything going. Bareboat Necessities is a “Marine OS for Raspberry Pi” that comes with everything included.

If you have a Victron GX device (for example Cerbo GX), you can also install Signal K on that.

Once Signal K is running, just look up signalk-logbook in the Signal K app store. You’ll also want to install the signalk-autostate and sailsconfiguration plugins to enable some of the automations.

Signal K appstore

Then just restart Signal K, log in, and start logging!

Sunday, 26 February 2023

Inferno on Microcontrollers

Last year I looked at microcontrollers a fair amount, though you probably wouldn't see much activity related to that if you follow these updates. If there was any obvious public activity at all, it was happening in the diary I occasionally update about Inferno-related things. Things went off-track when some initial explorations of Inferno on a Thumb-2 microcontroller ended in frustration, leading me back to writing bare metal code instead, and this indirectly resulted in an organiser application written in a simple language for the Ben NanoNote. As the year drew to a close, I picked up Inferno again and started to make some progress, resulting in something more concrete to show.

Meanwhile, long-time Inferno explorer, Caerwyn, has been investigating a port to the Raspberry Pi Pico. Last year also saw the publication of a thesis about porting Inferno OS to a Cortex-M7 device. Hopefully there are other ports in progress, or there will be new ones that develop once people find out about these efforts.

While it's a challenge to get a useful system running on these resource-constrained devices, it's a rewarding experience to be able to show something that works on at least a basic level. It's also interesting to see a Unix-like system running on something that might be expected to only run low-level, bare metal code.

Categories: Inferno, Limbo, Free Software

Wednesday, 15 February 2023

Send you talks for Linux App Summit 2023 *now*!

Call for proposal ends this Saturday 18th of February.


I'm sure there's lots of interesting things you have to talk about so head over to and press the "Submit your talk" button :)

Tuesday, 14 February 2023

I love Free Software – and Free Hardware Designs

For many years I have been using free software. I remember that one of my first GNU programs that I used was a chess game, ported to 16bit Windows. Many years later I switched to GNU/Linux and started programming myself, and also releasing my software under strong copyleft licences. I also discovered that many popular distros of GNU/Linux include non-free firmware. So I began contributing to GNU Guix, a fully free distro of the GNU System that excludes nonfree firmware blobs, nonfree games, and any other nonfree software.

Unfortunately many hardware vendors, including AMD, NVIDIA and Intel starting making their hardware Defective By Design, by implementing HDCP, a kind of hardware-level Digital Restrictions Management. Even if you never watch Netflix, you will be restricted by the non-free firmware, required to use their CPUs and GPUs. If we want to eliminate that form of hardware-level DRM, we will have to design our own Freedom-Respecting hardware. A few years after I baught my Talos II, I began contributing to the Libre-SOC project.

After switching to the POWER9, it was clear that I would not be able to play the nonfree DRM’d games that Valve distributes on their platform Steam. And I didn’d want to either. So I started porting existing free software games to the ppc64el architecture, including VR games such as V-Sekai and BeepSaber. I discovered that there was a libre-licensed SteamVR clone called libsurvive that implements libre licensed lighthouse-based tracking. So I baught my Valve Index, installed libsurvive and started playing with Godot4.

Today is 愛 ♥ Free Software Day 2023, which aims at raising awareness to Free Software and the passionate, hard-working people behind it. So I want to thank Luke Kenneth Casson Leighton who started the Libre-SOC project and Charles Lohr for their work on libsurvive. Last year the FSFE had an event dedicated to Free Software games, where we played Veloren, a libre licenced voxel game. The game was really fun, so I want to show my appreciation for their work. The same is true for SlimeVR/monado and Yosys/nextpnr. ���🌈���⚧�💛�💜🖤

Friday, 10 February 2023

Considering Unexplored Products of the Past: Formulating a Product

Previously, I described exploring the matter of developing emulation of a serial port, along with the necessary circuitry, for Elkulator, an emulator for the Acorn Electron microcomputer, motivated by a need to provide a way of transferring files into and out of the emulated computer. During this exploration, I had discovered some existing software that had been developed to provide some level of serial “filing system” support on the BBC Microcomputer – the higher-specification sibling of the Electron – with the development of this software having been motivated by an unforeseen need to transfer software to a computer without any attached storage devices.

This existing serial filing system software was a good indication that serial communications could provide the basis of a storage medium. But instead of starting from a predicament involving computers without usable storage facilities, where an unforeseen need motivates the development of a clever workaround, I wanted to consider what such a system might have been like if there had been a deliberate plan from the very beginning to deploy computers that would rely on a serial connection for all their storage needs. Instead of having an implementation of the filing system in RAM, one could have the luxury of putting it into a ROM chip that would be fitted in the computer or in an expansion, and a richer set of features might then be contemplated.

A Smarter Terminal

Once again, my interest in the historical aspects of the technology provided some guidance and some inspiration. When microcomputers started to become popular and businesses and institutions had to decide whether these new products had any relevance to their operations, there was some uncertainty about whether such products were capable enough to be useful or whether they were a distraction from the facilities already available in such organisations. It seems like a lifetime ago now, but having a computer on every desk was not necessarily seen as a guarantee of enhanced productivity, particularly if they did not link up to existing facilities or did not coordinate the work of a number of individuals.

At the start of the 1980s, equipping an office with a computer on every desk and equipping every computer with a storage solution was an expensive exercise. Even disk drives offering only a hundred kilobytes of storage on each removable floppy disk were expensive, and hard disk drives were an especially expensive and precious luxury that were best shared between many users. Some microcomputers were marketed as multi-user systems, encouraging purchasers to connect terminals to them and to share those precious resources: precisely the kind of thing that had been done with minicomputers and mainframes. Such trends continued into the mid-1980s, manifested by products promoted by companies with mainframe origins, such companies perpetuating entrenched tendencies to frame computing solutions in certain ways.

Terminals themselves were really just microcomputers designed for the sole purpose of interacting with a “host” computer, and institutions already operating mainframes and minicomputers would have experienced the need to purchase several of them. Until competition intensified in the terminal industry, such products were not particularly cheap, with the DEC VT220 introduced in 1983 costing $1295 at its introduction. Meanwhile, interest in microcomputers and the possibility of distributing some kinds of computing activity to these new products, led to experimentation in some organisations. Some terminal manufacturers responded by offering terminals that also ran microcomputer software.

Much of the popular history of microcomputing, familiar to anyone who follows such topics online, particularly through YouTube videos, focuses on adoption of such technology in the home, with an inevitable near-obsession with gaming. The popular history of institutional adoption often focuses on the upgrade parade from one generation of computer to the next. But there is a lesser told history involving the experimentation that took place at the intersection of microcomputing and minicomputing or mainframe computing. In universities, computers like the BBC Micro were apparently informally introduced as terminals for other systems, terminal ROMs were developed and shared between institutions. However, there seems to have been relatively little mainstream interest in such software as fully promoted commercial products, although Acornsoft – Acorn’s software outlet – did adopt such a ROM to sell as their Termulator product.

The Acorn Electron, introduced at £199, had a “proper” keyboard and the ability to display 80 columns of text, unlike various other popular microcomputers. Indeed, it may have been the lowest-priced computer to be able to display 80 columns of relatively high definition text as standard, such capabilities requiring extra cards for machines like the Apple II and the Commodore 64. Considering the much lower price of such a computer, the ongoing experimentation underway at the time with its sibling machine on alternative terminal solutions, and the generally favourable capabilities of both these machines, it seems slightly baffling that more was not done to pursue opportunities to introduce a form of “intelligent terminal” or “hybrid terminal” product to certain markets.

VIEW in 80 columns on the Acorn Electron.

VIEW in 80 columns on the Acorn Electron.

None of this is to say that institutional users would have been especially enthusiastic. In some institutions, budgets were evidently generous enough that considerable sums of money would be spent acquiring workstations that were sometimes of questionable value. But in others, the opportunity to make savings, to explore other ways of working, and perhaps also to explicitly introduce microcomputing topics such as software development for lower-specification hardware would have been worthy of some consideration. An Electron with a decent monochrome monitor, like the one provided with the M2105, plus some serial hardware, could have comprised a product sold for perhaps as little as £300.

The Hybrid Terminal

How would a “hybrid terminal” solution work, how might it have been adopted, and what might it have been used for? Through emulation and by taking advantage of the technological continuity in multi-user systems from the 1980s to the present day, we can attempt to answer such questions. Starting with communications technologies familiar in the world of the terminal, we might speculate that a serial connection would be the most appropriate and least disruptive way of interfacing a microcomputer to a multi-user system.

Although multi-user systems, like those produced by Digital Equipment Corporation (DEC), might have offered network connectivity, it is likely that such connectivity was proprietary, expensive in terms of the hardware required, and possibly beyond the interfacing capabilities of most microcomputers. Meanwhile, Acorn’s own low-cost networking solution, Econet, would not have been directly compatible with these much higher-end machines. Acorn’s involvement in network technologies is also more complicated than often portrayed, but as far as Econet is concerned, only much later machines would more conveniently bridge the different realms of Econet and standards-based higher-performance networks.

Moreover, it remains unlikely that operators and suppliers of various multi-user systems would have been enthusiastic about fitting dedicated hardware and installing dedicated software for the purpose of having such systems communicate with third-party computers using a third-party network technology. I did find it interesting that someone had also adapted Acorn’s network filing system that usually runs over Econet to work instead over a serial connection, which presumably serves files out of a particular user account. Another discovery I made was a serial filing system approach by someone who had worked at Acorn who wanted to transfer files between a BBC Micro system and a Unix machine, confirming that such functionality was worth pursuing. (And there is also a rather more complicated approach involving more exotic Acorn technology.)

Indeed, to be successful, a hybrid terminal approach would have to accommodate existing practices and conventions as far as might be feasible in order to not burden or disturb the operators of these existing systems. One motivation from an individual user’s perspective might be to justify introducing a computer on their desk, to be able to have it take advantage of the existing facilities, and to augment those facilities where it might be felt that they are not flexible or agile enough. Such users might request help from the operators, but the aim would be to avoid introducing more support hassles, which would easily arise if introducing a new kind of network to the mix. Those operators would want to be able to deploy something and have it perform a role without too much extra thought.

I considered how a serial link solution might achieve this. An existing terminal would be connected to, say, a Unix machine and be expected to behave like a normal client, allowing the user to log into their account. The microcomputer would send some characters down the serial line to the Unix “host”, causing it to present the usual login prompt, and the user would then log in as normal. They would then have the option of conducting an interactive session, making their computer like a conventional terminal, but there would also be the option of having the Unix system sit in the background, providing other facilities on request.

Logging into a remote service via a serial connection.

Logging into a remote service via a serial connection.

The principal candidates for these other facilities would be file storage and printing. Both of these things were centrally managed in institutions, often available via the main computing service, and the extensible operating system of the Electron and related microcomputers invites the development of software to integrate the core support for these facilities with such existing infrastructure. Files would be loaded from the user’s account on the multi-user system and saved back there again. Printing would spool the printed data to files somewhere in the user’s home directory for queuing to centralised printing services.

Attempting an Implementation

I wanted to see how such a “serial computing environment” would work in practice, how it would behave, what kinds of applications might benefit, and what kind of annoyances it might have. After all, it might be an interesting idea or a fun idea, but it need not be a particularly good one. The first obstacle was that of understanding how the software elements would work, primarily on the Electron itself, from the tasks that I would want the software to perform down to the way the functionality would be implemented. On the host or remote system, I was rather more convinced that something could be implemented since it would mostly be yet another server program communicating over a stream, with plenty of modern Unix conveniences to assist me along the way.

As it turned out, my investigations began with a trip away from home and the use of a different, and much more constrained, development environment involving an ARM-based netbook. Fortunately, Elkulator and the different compilers and tools worked well enough on that development hardware to make the exercise approachable. Another unusual element was that I was going to mostly rely on the original documentation in the form of the actual paper version of the Acorn Electron Advanced User Guide for information on how to write the software for the Electron. It was enlightening coming back to this book after a few decades for assistance on a specific exercise, even though I have perused the book many times in its revised forms online, because returning to it with a focus on a particular task led me to find that the documentation in the book was often vague or incomplete.

Although the authors were working in a different era and presumably under a degree of time pressure, I feel that the book in some ways exhibits various traits familiar to those of us working in the software industry, these indicating a lack of rigour and of sufficient investment in systems documentation. For this, I mostly blame the company who commissioned the work and then presumably handed over some notes and told the authors to fill in the gaps. As if to strengthen such perceptions of hurriedness and lack of review, it also does not help that “system” is mis-spelled “sysem” in a number of places in the book!

Nevertheless, certain aspects of the book were helpful. The examples, although focusing on one particular use-case, did provide helpful detail in deducing the correct way of using certain mechanisms, even if they elected to avoid the correct way of performing other tasks. Acorn’s documentation had a habit of being “preachy” about proper practices, only to see its closest developers ignore those practices, anyway. Eventually, on returning from my time away, I was able to fill in some of the gaps, although by this time I had a working prototype that was able to do basic things like initiate a session on the host system and to perform some file-related operations.

There were, and still are, a lot of things that needed, and still need, improvement with my implementation. The way that the operating system needs to be extended to provide extra filing system functionality involves plenty of programming interfaces, plenty of things to support, and also plenty of opportunities for things to go wrong. The VIEW word processor makes use of interfaces for both whole-file loading and saving as well as random-access file operations. Missing out support for one or the other will probably not yield the desired level of functionality.

There are also intricacies with regard to switching printing on and off – this typically being done using control characters sent through the output stream – and of “spool” files which capture character output. And filing system ROMs need to be initialised through a series of “service calls”, these being largely documented, but the overall mechanism is left largely undescribed in the documentation. It is difficult enough deciphering the behaviour of the Electron’s operating system today, with all the online guidance available in many forms, so I cannot imagine how difficult it would have been as a third party to effectively develop applications back in the day.

Levels of Simulation

To support the activities of the ROM software in the emulated Electron, I had to develop a server program running on my host computer. As noted above, this was not onerous, especially since I had already written a program to exercise the serial communications and to interact with the emulated serial port. I developed this program further to respond to commands issued by my ROM, performing host operations and returning results. For example, the CAT command produces a “catalogue” of files in a host directory, and so my server program performs a directory listing operation, collects the names of the files, and then sends them over the virtual serial link to the ROM for it to display to the user.

To make the experience somewhat authentic and to approximate to an actual deployment environment, I included a simulation of the login prompt so that the user of the emulated Electron would have to log in first, with the software also having to deal with a logged out (or not yet logged in) condition in a fairly graceful way. To ensure that they are logged in, a user selects the Serial Computing Environment using the *SCE command, this explicitly selecting the serial filing system, and the login dialogue is then presented if the user has not yet logged into the remote host. Once logged in, the ROM software should be able to test for the presence of the command processor that responds to issued commands, only issuing commands if the command processor has signalled its presence.

Although this models a likely deployment environment, I wanted to go a bit further in terms of authenticity, and so I decided to make the command processor a separate program that would be installed in a user account on a Unix machine. The user’s profile script would be set up to run the command processor, so that when they logged in, this program would automatically run and be ready for commands. I was first introduced to such practices in my first workplace where a menu-driven, curses-based program I had written was deployed so that people doing first-line technical support could query the database of an administrative system without needing to be comfortable with the Unix shell environment.

For complete authenticity I would actually want to have the emulated Electron contact a Unix-based system over a physical serial connection, but for now I have settled for an arrangement whereby a pseudoterminal is created to run the login program, with the terminal output presented to the emulator. Instead of seeing a simulated login dialogue, the user now interacts with the host system’s login program, allowing them to log into a real account. At that point, the command processor is invoked by the shell and the user gets back control.

Obtaining a genuine login dialogue from a Unix system.

Obtaining a genuine login dialogue from a Unix system.

To prevent problems with certain characters, the command processor configures the terminal to operate in raw mode. Apart from that, it operates mostly as it did when run together with the login simulation which did not have to concern itself with such things as terminals and login programs.

Some Applications

This effort was motivated by the need or desire to be able to access files from within Elkulator, particularly from applications such as VIEW. Naturally, VIEW is really just one example from the many applications available for the Electron, but since it interacts with a range of functionality that this serial computing environment provides, it serves to showcase such functionality fairly well. Indeed, some of the screenshots featured in this and the previous article show VIEW operating on text that was saved and loaded over the serial connection.

Accessing files involves some existing operating system commands, such as *CAT (often abbreviated to *.) to list the catalogue of a storage medium. Since a Unix host supports hierarchical storage, whereas the Electron’s built-in command set only really addresses the needs of a flat storage medium (as provided by various floppy disk filing systems for Electron and BBC Micro), the *DIR command has been introduced from Acorn’s hierarchical filing systems (such as ADFS) to navigate between directories, which is perhaps confusing to anyone familiar with other operating systems, such as the different variants of DOS and their successors.

Using catalogue and directory traversal commands.

Using catalogue and directory traversal commands.

VIEW allows documents to be loaded and saved in a number of ways, but as a word processor it also needs to be able to print these documents. This might be done using a printer connected to a parallel port, but it makes a bit more sense to instead allow the serial printer to be selected and for printing to occur over the serial connection. However, it is not sufficient to merely allow the operating system to take over the serial link and to send the printed document, if only because the other side of this link is not a printer! Indeed, the command processor is likely to be waiting for commands and to see the incoming data as ill-formed input.

The chosen solution was to intercept attempts to send characters to a serial printer, buffering them and then sending the buffered data in special commands to the command processor. This in turn would write the printed characters to a “spool” file for each printing session. From there, these files could be sent to an appropriate printer. This would give the user rather more control over printing, allowing them to process the printout with Unix tools, or to select one particular physical printer out of the many potentially available in an organisation. In the VIEW environment, and in the MOS environment generally, there is no built-in list of printers or printer selection dialogue.

Since the kinds of printers anticipated for use with VIEW might well have been rather different from the kinds connected to multi-user systems, it is likely that some processing would be desirable where different text styles and fonts have been employed. Today, projects like PrinterToPDF exist to work with old-style printouts, but it is conceivable that either the “printer driver generator” in the View suite or some postprocessing tool might have been used to produce directly printable output. With unstyled text, however, the printouts are generally readable and usable, as the following excerpt illustrates.

               A  brief report on the experience
               of using VIEW as a word processor
               four decades on.

Using VIEW on the Acorn  Electron  is  an  interesting  experience  and  a
glimpse  into  the  way  word  processing  was  once done. Although I am a
dedicated user of Vim, I am under no  illusions  of  that  program's  word
processing  capabilities: it is deliberately a screen editor based on line
editor  heritage,  and  much  of  its  operations  are  line-oriented.  In
contrast, VIEW is intended to provide printed output: it presents the user
with a  ruler  showing  the  page margins and tab stops, and it even saves
additional   rulers   into  the  stored  document   in   their   on-screen
representations. Together with its default typewriter-style  behaviour  of
allowing  the  cursor  to  be moved into empty space and of overwriting or
replacing text, there is a quaint feel to it.

Since VIEW is purely text-based, I can easily imagine converting its formatting codes to work with troff. That would then broaden the output options. Interestingly, the Advanced User Guide was written in VIEW and then sent to a company for typesetting, so perhaps a workflow like this would have been useful for the authors back then.

A major selling point of the Electron was its provision of BBC BASIC as the built-in language. As the BBC Micro had started to become relatively widely adopted in schools across the United Kingdom, a less expensive computer offering this particular dialect of BASIC was attractive to purchasers looking for compatibility with school computers at home. Obviously, there is a need to be able to load and save BASIC programs, and this can be done using the serial connection.

Loading a BASIC program from the Unix host.

Loading a BASIC program from the Unix host.

Beyond straightforward operations like these, BASIC also provides random-access file operations through various keywords and constructs, utilising the underlying operating system interfaces that invoke filing system operations to perform such work. VIEW also appears to use these operations, so it seems sensible not to ignore them, even if many programmers might have preferred to use bulk transfer operations – the standard load and save – to get data in and out of memory quickly.

A BASIC program reading and showing a file.

A BASIC program reading and showing a file.

Interactions between printing, the operating system’s own spooling support, outputting characters and reading and writing data are tricky. A degree of experimentation was required to make these things work together. In principle, it should be possible to print and spool at the same time, even with output generated by the remote host that has been sent over the serial line for display on the Electron!

Of course, as a hybrid terminal, the exercise would not be complete without terminal functionality. Here, I wanted to avoid going down another rabbit hole and implementing a full terminal emulator, but I still wanted to demonstrate the invocation of a shell on the Unix host and the ability to run commands. To show just another shell session transcript would be rather dull, so here I present the perusal of a Python program to generate control codes that change the text colour on the Electron, along with the program’s effects:

Interaction with the shell featuring multiple text colours.

Interaction with the shell featuring multiple text colours.

As a bitmapped terminal, the Electron is capable of much more than this. Although limited to moderate resolutions by the standards of the fanciest graphics terminals even of that era, there are interesting possibilities for Unix programs and scripts to generate graphical output.

A chart generated by a Python program showing workstation performance results.

A chart generated by a Python program showing workstation performance results.

Sending arbitrary character codes requires a bit of terminal configuration magic so that line feeds do not get translated into other things (the termios manual page is helpful, here, suggesting the ONLCR flag as the culprit), but the challenge, as always, is to discover the piece of the stack of technologies that is working against you. Similar things can be said on the Electron as well, with its own awkward confluence of character codes for output and output control, requiring the character output state to be tracked so that certain values do not get misinterpreted in the wrong context.

Others have investigated terminal connectivity on Acorn’s 8-bit microcomputers and demonstrated other interesting ways of producing graphical output from Unix programs. Acornsoft’s Termulator could even emulate a Tektronix 4010 graphical terminal. Curiously, Termulator also supported file transfer between a BBC Micro and the host machine, although only as a dedicated mode and limited to ASCII-only text files, leaving the hybrid terminal concept unexplored.

Reflections and Remarks

I embarked on this exercise with some cautiousness, knowing that plenty of uncertainties lay ahead in implementing a functional piece of software, and there were plenty of frustrating moments as some of the different elements of the rather underdocumented software stack conspired to produce undesirable behaviour. In addition, the behaviour of my serial emulation code had a confounding influence, requiring some low-level debugging (tracing execution within the emulator instruction by instruction, noting the state of the emulated CPU), some slowly dawning realisations, and some adjustments to hopefully make it work in a more cooperative fashion.

There are several areas of potential improvement. I first programmed in 6502 assembly language maybe thirty-five years ago, and although I managed to get some sprite and scrolling routines working, I never wrote any large programs, nor had to interact with the operating system frameworks. I personally find the 6502 primitive, rigid, and not particularly conducive to higher-level programming techniques, and I found myself writing some macros to take away the tedium of shuffling values between registers and the stack, constantly aware of various pitfalls with regard to corrupting registers.

My routines extending the operating system framework possibly do not do things the right way or misunderstand some details. That, I will blame on the vague documentation as well as any mistakes made micromanaging the registers. Particularly frustrating was the way that my ROM code would be called with interrupts disabled in certain cases. This made implementation challenging when my routines needed to communicate over the serial connection, when such communication itself requires interrupts to be enabled. Quite what the intention of the MOS designers was in such circumstances remains something of a mystery. While writing this article, I realised that I could have implemented the printing functionality in a different way, and this might have simplified things, right up to the point where I saw, thanks to the debugger provided by Elkulator, that the routines involved are called – surprise! – with interrupts disabled.

Performance could be a lot better, with this partly due to my own code undoubtedly requiring optimisation. The existing software stack is probably optimised to a reasonable extent, but there are various persistent background activities that probably steal CPU cycles unnecessarily. One unfortunate contributor to performance limitations is the hardware architecture of the Electron. Indeed, I discovered while testing in one of the 80-column display modes that serial transfers were not reliable at the default transfer rate of 9600 baud, instead needing to be slowed down to only 2400 baud. Some diagnosis confirmed that the software was not reading the data from the serial chip quickly enough, causing an overflow condition and data being lost.

Motivated by cost reduction and product positioning considerations – the desire to avoid introducing a product that might negatively affect BBC Micro sales – the Electron was deliberately designed to use a narrow data bus to fewer RAM chips than otherwise would have been used, with a seemingly clever technique being employed to allow the video circuitry to get the data at the desired rate to produce a high-resolution or high-bandwidth display. Unfortunately, the adoption of the narrow data bus, facilitated by the adoption of this particular technique, meant that the CPU could only ever access RAM at half its rated speed. And with the narrow data bus, the video circuitry effectively halts the CPU altogether for a substantial portion of its time in high-bandwidth display modes. Since serial communications handling relies on the delivery and handling of interrupts, if the CPU is effectively blocked from responding quickly enough, it can quickly fall behind if the data is arriving and the interrupts are occurring too often.

That does raise the issue of reliability and of error correction techniques. Admittedly, this work relies on a reliable connection between the emulated Electron and the host. Some measures are taken to improve the robustness of the communication when messages are interrupted so that the host in particular is not left trying to send or receive large volumes of data that are no longer welcome or available, and other measures are taken to prevent misinterpretation of stray data received in a different and thus inappropriate context. I imagine that I may have reinvented the wheel badly here, but these frustrations did provide a level of appreciation of the challenges involved.

Some Broader Thoughts

It is possible that Acorn, having engineered the Electron too aggressively for cost, made the machine less than ideal for the broader range of applications for which it was envisaged. That said, it should have been possible to revise the design and produce a more performant machine. Experiments suggest that a wider data path to RAM would have helped with the general performance of the Electron, but to avoid most of the interrupt handling problems experienced with the kind of application being demonstrated here, the video system would have needed to employ its existing “clever” memory access technique in conjunction with that wider data path so as to be able to share the bandwidth more readily with the CPU.

Contingency plans should have been made to change or upgrade the machine, if that had eventually been deemed necessary, starting at the point in time when the original design compromises were introduced. Such flexibility and forethought would also have made a product with a longer appeal to potential purchasers, as opposed to a product that risked being commercially viable for only a limited period of time. However, it seems that the lessons accompanying such reflections on strategy and product design were rarely learned by Acorn. If lessons were learned, they appear to have reinforced a particular mindset and design culture.

Virtue is often made of the Acorn design philosophy and the sometimes rudely expressed and dismissive views of competing technologies that led the company to develop the ARM processor. This approach enabled comparatively fast and low-cost systems to be delivered by introducing a powerful CPU to do everything in a system from running applications to servicing interrupts for data transfers, striving for maximal utilisation of the available memory bandwidth by keeping the CPU busy. That formula worked well enough at the low end of the market, but when the company tried to move upmarket once again, its products were unable to compete with those of other companies. Ultimately, this sealed the company’s fate, even if more fortuitous developments occurred to keep ARM in the running.

(In the chart shown earlier demonstating graphical terminal output and illustrating workstation performance, circa 1990, Acorn’s R260 workstation is depicted as almost looking competitive until one learns that the other workstations depicted arrived a year earlier and that the red bar showing floating-point performance only applies to Acorn’s machine three years after its launch. It would not be flattering to show the competitors at that point in history, nor would it necessarily be flattering to compare whole-system performance, either, if any publication sufficiently interested in such figures had bothered to do so. There is probably an interesting story to be told about these topics, particularly how Acorn’s floating-point hardware arrived so late, but I doubt that there is the same willingness to tell it as there is to re-tell the usual celebratory story of ARM for the nth time.)

Acorn went on to make the Communicator as a computer that would operate in a kind of network computing environment, relying on network file servers to provide persistent storage. It reused some of the technology in the Electron and the BT Merlin M2105, particularly the same display generator and its narrow data bus to RAM, but ostensibly confining that aspect of the Electron’s architecture to a specialised role, and providing other facilities for applications and, as in the M2105, for interaction with peripherals. Sadly, the group responsible in Acorn had already been marginalised and eventually departed, apparently looking to pursue the concept elsewhere.

As for this particular application of an old computer and a product that was largely left uncontemplated, I think there probably was some mileage in deploying microcomputers in this way, even outside companies like Acorn where such computers were being developed and used, together with software development companies with their own sophisticated needs, where minicomputers like the DEC VAX would have been available for certain corporate or technical functions. Public (or semi-public) access terminals were fairly common in universities, and later microcomputers were also adopted in academia due to their low cost and apparently sufficient capabilities.

Although such adoption appears to have focused on terminal applications, it cannot have been beyond the wit of those involved to consider closer integration between the microcomputing and multi-user environments. In further and higher education, students will have had microcomputing experience and would have been able to leverage their existing skills whilst learning new ones. They might have brought their microcomputers along with them, giving them the opportunity to transfer or migrate their existing content – their notes, essays, programs – to the bright and emerging new world of Unix, as well as updating their expertise.

As for updating my own expertise, it has been an enlightening experience in some ways, and I may well continue to augment the implemented functionality, fix and improve things, and investigate the possibilities this work brings. I hope that this rather lengthy presentation of the effort has provided insights into experiences of the past that was and the past that might have been.

Tuesday, 07 February 2023

Considering Unexplored Products of the Past: Emulating an Expansion

In the last couple of years, possibly in common with quite a few other people, certainly people of my vintage, and undoubtedly those also interested in retrocomputing, I have found myself revisiting certain aspects of my technological past. Fortunately, sites like the Internet Archive make this very easy indeed, allowing us to dive into publications from earlier eras and to dredge up familiar and not so familiar magazine titles and other documentation. And having pursued my retrocomputing interest for a while, participating in forums, watching online videos, even contributing to new software and hardware developments, I have found myself wanting to review some of the beliefs and perceptions that I and other people have had of the companies and products we grew up with.

One of the products of personal interest to me is the computer that got me and my brother started with writing programs (as well as playing games): the Acorn Electron, a product of Acorn Computers of Cambridge in the United Kingdom. Much can be said about the perceived chronology of this product’s development and introduction, the actual chronology, and its impact on its originator and on wider society, but that surely deserves a separate treatment. What I can say is that reviewing the archives and other knowledge available to us now can give a deeper understanding of the processes involved in the development of the Electron, the technological compromises made, and the corporate strategy that led to its creation and eventually its discontinuation.

By Bilby - Own work, CC BY 3.0,

The Acorn Electron
(Picture attribution: By BilbyOwn work, CC BY 3.0, Link)

It has been popular to tell simplistic narratives about Acorn Computers, to reduce its history to a few choice moments as the originator of the BBC Microcomputer and the ARM processor, but to do so is to neglect a richer and far more interesting story, even if the fallibility of some of the heroic and generally successful characters involved may be exposed by telling some of that story. And for those who wonder how differently some aspects of computing history might have turned out, exploring that story and the products involved can be an adventure in itself, filling in the gaps of our prior experiences with new insights, realisations and maybe even glimpses into opportunities missed and what might have been if things had played out differently.

At the Rabbit Hole

Reading about computing history is one thing, but this tale is about actually doing things with old software, emulation, and writing new software. It started off with a discussion about the keyboard shortcuts for a word processor and the differences between the keyboards on the Acorn Electron and its higher-specification predecessor, the BBC Microcomputer. Having acquainted myself with the circuitry of the Electron, how its keyboard is wired up, and how the software accesses it, I was obviously intrigued by these apparent differences, but I was also intrigued by the operation of the word processor in question, Acornsoft’s VIEW.

Back in the day, as people like to refer to the time when these products were first made available, such office or productivity applications were just beyond my experience. Although it was slightly fascinating to read about them, most of my productive time was spent writing programs, mostly trying to write games. I had actually seen an office suite written by Psion on the ACT Sirius 1 in the early 1980s, but word processors were the kind of thing that people used in offices or, at the very least, by people who had a printer so that they could print the inevitable letters that everyone would be needing to write.

Firing up an Acorn Electron emulator, specifically Elkulator, I discovered that one of the participants in the discussion was describing keyboard shortcuts that didn’t match up to those that were described in a magazine article from the era, these appearing correct as I tried them out for myself. It turned out that the discussion participant in question was using the BBC Micro version of VIEW on the Electron and was working around the mismatch in keyboard layouts. Although all of this was much ado about virtually nothing, it did two things. Firstly, it made me finally go in and fix Elkulator’s keyboard configuration dialogue, and secondly, it made me wonder how convenient it would be to explore old software in a productive way in an emulator.

Reconciling Keyboards

Having moved to Norway many years ago now, I use a Norwegian keyboard layout, and this has previously been slightly problematic when using emulators for older machines. Many years ago, I used and even contributed some minor things to another emulator, ElectrEm, which had a nice keyboard configuration dialogue. The Electron’s keyboard corresponds to certain modern keyboards pretty well, at least as far as the alphanumeric keys are concerned. More challenging are the symbols and control-related keys, in particular the Electron’s special Caps Lock/Function key which sits where many people now have their Tab key.

Obviously, there is a need to be able to tell an emulator which keys on a modern keyboard are going to correspond to the keys on the emulated machine. Being derived from an emulator for the BBC Micro, however, Elkulator’s keyboard configuration dialogue merely presented a BBC Micro keyboard on the screen and required the user to guess which “Beeb” key might correspond to an Electron one. Having put up with this situation for some time, I finally decided to fix this once and for all. The process of doing so is not particularly interesting, so I will spare you the details of doing things with the Allegro toolkit and the Elkulator source code, but I was mildly pleased with the result:

The revised keyboard configuration dialogue in Elkulator.

The revised keyboard configuration dialogue in Elkulator.

By also adding support for redefining the Break key in a sensible way, I was also finally able to choose a key that desktop environments don’t want to interfere with: F12 might work for Break, but Ctrl-F12 makes KDE/Plasma do something I don’t want, and yet Ctrl-Break is quite an important key combination when using an Electron or BBC Micro. Why Break isn’t a normal key on these machines is another story in itself, but here is an example of redefining it and even allowing multiple keys on a modern keyboard to act as Break on the emulated computer:

Redefining the Break key in Elkulator.

Redefining the Break key in Elkulator.

Being able to confidently choose and use keys made it possible to try out VIEW in a more natural way. But this then led to another issue: how might I experiment with such software productively? It would be good to write documents and to be able to extract them from the emulator, rather than see them disappear when the emulator is closed.

Real and Virtual Machines

One way to get text out of a system, whether it is a virtual system like the emulated Electron or a real machine, is to print it. I vaguely remembered some support for printing from Elkulator and was reminded by my brother that he had implemented such support himself a while ago as a quick way of getting data out of the emulated system. But I also wanted to be able to get data into the emulated system as well, and the parallel interface typically used by the printer is not bidirectional on the Electron. So, I would need to look further for a solution.

It is actually the case that Elkulator supports reading from and writing to disk (or disc) images. The unexpanded Electron supports read/write access to cassettes (or tapes), but Elkulator does not support writing to tapes, probably because the usability considerations are rather complicated: one would need to allow the user to control the current position on a tape, and all this would do is to remind everyone how inconvenient tapes are. Meanwhile, writing to disk images would be fairly convenient within the emulator, but then one would need to use tools to access the files within the images outside the emulator.

Some emulators for various systems also support the notion of a host filesystem (or filing system) where some special support has been added to make the emulated machine see another peripheral and to communicate with it, this peripheral really being a program on the host machine (the machine that is running the emulator). I could have just written such support, although it would also have needed some software support written for the emulated machine as well, but this approach would have led me down a path of doing something specific to emulation. And I have a principle of sorts which is that if I am going to change the way an emulated machine behaves, it has to be rooted in some kind of reality and not just enhance the emulated machine in a way that the original, “real” machine could not have been.

Building on Old Foundations

As noted earlier, I have an interest in the way that old products were conceived and the roles for which those products were intended by their originators. The Electron was largely sold as an unexpanded product, offering only power, display and cassette ports, with a general-purpose expansion connector being the gateway to anything else that might have been added to the system later. This was perceived somewhat negatively when the machine was launched because it was anticipated that buyers would probably, at the very least, want to plug joysticks into the Electron to play games. Instead, Acorn offered an expansion unit, the Plus 1, that cost another £60 which provided joystick, printer and cartridge connectors.

But this flexibility in expanding the machine meant that it could have been used as the basis for a fairly diverse range of specialised products. In fact, one of the Acorn founders, Chris Curry, enthused about the Electron as a platform for such products, and one such product did actually make it to market, in a way: the BT Merlin M2105 messaging terminal. This terminal combined the Electron with an expansion unit containing circuitry for communicating over a telephone line, a generic serial communications port, a printer port, as well as speech synthesis circuitry and a substantial amount of read-only memory (ROM) for communications software.

Back in the mid-1980s, telecommunications (or “telecoms”) was the next big thing, and enthusiasm for getting a modem and dialling up some “online” service or other (like Prestel) was prevalent in the computing press. For businesses and institutions, there were some good arguments for adopting such technologies, but for individuals the supposed benefits were rather dulled by the considerable costs of acquiring the hardware, buying subscriptions, and the notoriously high telephone call rates of the era. Only the relatively wealthy or the dedicated few pursued this side of data communications.

The M2105 reportedly did some service in the healthcare sector before being repositioned for commercial applications. Along with its successor product, the Acorn Communicator, it enjoyed a somewhat longer lifespan in certain enterprises. For the standard Electron and its accompanying expansions, support for basic communications capabilities was evidently considered important enough to be incorporated into the software of the Plus 1 expansion unit, even though the Plus 1 did not provide any of the specific hardware capabilities for communication over a serial link or a telephone line.

It was this apparently superfluous software capability that I revisited when I started to think about getting files in and out of the emulator. When emulating an Electron with Plus 1, this serial-capable software is run by the emulator, just as it is by a real Electron. On a real system of this kind, a cartridge could be added that provides a serial port and the necessary accompanying circuitry, and the system would be able to drive that hardware. Indeed, such cartridges were produced decades ago. So, if I could replicate the functionality of a cartridge within the emulator, making some code that pretends to be a serial communications chip (or UART) that has been interfaced to the Electron, then I would in principle be able to set up a virtual serial connection between the emulated Electron and my modern host computer.

Emulated Expansions

Modifying Elkulator to add support for serial communications hardware was fairly straightforward, with only a few complications. Expansion hardware on the Electron is generally accessible via a range of memory addresses that actually signal peripherals as opposed to reading and writing memory. The software provided by the Plus 1 expansion unit is written to expect the serial chip to be accessible via a range of memory locations, with the serial chip accepting values sent to those locations and producing values from those locations on request. The “memory map” through which the chip is exposed in the Electron corresponds directly to the locations or registers in the serial chip – the SCN2681 dual asynchronous receiver/transmitter (DUART) – as described by its datasheet.

In principle, all that is needed is to replicate the functionality described by the datasheet. With this done, the software will drive the chip, the emulated chip will do what is needed, and the illusion will be complete. In practice, a certain level of experimentation is needed to fill in the gaps left by the datasheet and any lack of understanding on the part of the implementer. It did help that the Plus 1 software has been disassembled – some kind of source code regenerated from the binary – so that the details of its operation and its expectations of the serial chip’s operation can be established.

Moreover, it is possible to save a bit of effort by seeing which features of the chip have been left unused. However, some unused features can be provided with barely any extra effort: the software only drives one serial port, but the chip supports two in largely the same way, so we can keep support for two just in case there is a need in future for such capabilities. Maybe someone might make a real serial cartridge with two ports and want to adapt the existing software, and they could at least test that software under emulation before moving to real hardware.

It has to be mentioned that the Electron’s operating system, known as the Machine Operating System or MOS, is effectively extended by the software provided in the Plus 1 unit. Even the unexpanded machine provides the foundations for adding serial communications and printing capabilities in different ways, and the Plus 1 software merely plugs into that framework. A different kind of serial chip would be driven by different software but it would plug into the same framework. At no point does anyone have to replace the MOS with a patched version, which seems to be the kind of thing that happens with some microcomputers from the same era.

Ultimately, what all of this means is that having implemented the emulated serial hardware, useful things can already be done with it within the bare computing environment provided by the MOS. One can set the output stream to use the serial port and have all the text produced by the system and programs sent over the serial connection. One can select the serial port for the input stream and send text to the computer instead of using the keyboard. And printing over the serial connection is also possible by selecting the appropriate printer type using a built-in system command.

In Elkulator, I chose to expose the serial port via a socket connection, with the emulator binding to a Unix domain socket on start-up. I then wrote a simple Python program to monitor the socket, to show any data being sent from the emulator and to send any input from the terminal to the emulator. This permitted the emulated machine to be operated from a kind of remote console and for the emulated machine to be able to print to this console. At last, remote logins are possible on the Electron! Of course, such connectivity was contemplated and incorporated from the earliest days of these products.

Filing Options

If the goal of all of this had been to facilitate transfers to and from the emulated machine, this might have been enough, but a simple serial connection is not especially convenient to use. Although a method of squirting a file into the serial link at the Electron could be made convenient for the host computer, at the other end one has to have a program to do something with that file. And once the data has arrived, would it not be most convenient to be able to save that data as a file? We just end up right back where we started: having some data inside the Electron and nowhere to put it! Of course, we could enable disk emulation and store a file on a virtual disk, but then it might just have been easier to make disk image handling outside the emulator more convenient instead.

It seemed to me that the most elegant solution would be to make the serial link act as the means through which the Electron accesses files. That instead of doing ad-hoc transfers of data, such data would be transferred as part of operations that are deliberately accessing files. Such ambitions are not unrealistic, and here I could draw on my experience with the platform, having acquired the Acorn Electron Advanced User Guide many, many years ago, in which there are details of implementing filing system ROMs. Again, the operating system had been designed to be extended in order to cover future needs, and this was one of them.

In fact, I had not been the only one to consider a serial filing system, and I had been somewhat aware of another project to make software available via a serial link to the BBC Micro. That project had been motivated by the desire to be able to get software onto that computer where no storage devices were otherwise available, even performing some ingenious tricks to transfer the filing system software to the machine and to have that software operate from RAM. It might have been tempting merely to use this existing software with my emulated serial port, to get it working, and then to get back to trying out applications, loading and saving, and to consider my work done. But I had other ideas in mind…

Sunday, 29 January 2023

Artificial intelligence is not willing to be correct

As deep learning models get better at representing human language, telling whether a text was written by a human being or a deep learning model becomes harder and harder. And because language models reproduce text found online (often without attribution); the risk of considering their output as if they were written by a human changes the reading experience for the reader.

The last year has been incredible for natural (and programming) language processing. GitHub’s Copilot has been out of technical preview since June, and ChatGPT was released in November. Copilot is based on OpenAI Codex and acts as a source code generator (which raises several issues of its own). ChatGPT is a language model built for dialogue, where a user can chat with the AI, ask questions and have them answered. Both are trained with data from web scrapping, with source code for Copilot and webpages for ChatGPT. Those models work particularly well for their respective purposes, and can thus be used to generate seemingly convincing source code or prose.

Because AI-generated texts are convincing, the fact that they were generated by an AI is not obvious to the careless reader. This is problematic, as there is no guarantee that the text is factually correct and that the human leveraging the AI checked it for mistakes. When reading, this may create a discomfort, as the reader has to determine whether a text was generated by an AI, and if so, if the publisher made sure that it is correct. Companies already started to use AI generated text for articles without clearly visible disclaimers and riddled with errors. The fact that text generated by ChatGPT may contain inaccuracies was acknowledged by OpenAI’s CEO. One might argue that humans make mistakes, too, and that prose or source code written by a human being can therefore also be wrong. This is true. However, the intent behind the text differs. In most cases, the author of a text tries their best to make it correct. But the language model does not understand the concept of correctness and will happily generate text containing wrong facts, which changes the tacitly assumed rules of writing and reading content.

Gaining trust in the text generated by an AI is thus a worthwhile objective. Here are partial solutions to this:

  • Watermarking texts generated by GPT models is a work in progress. Among the possible ones, the words chosen by the AI would embed a proof (using asymmetric cryptography) in their probability distribution. While this does not alleviate the concern stated above, this allows the reader to avoid AI-generated text if he or she wants to.

  • Connecting the text generated by the AI back to what lead it to generate the text might be another may offer a partial solution. If the readers can verify the trustworthiness of the sources, they might feel more confident about the AI-generated text they are reading.

  • If citing the source is too involved computationally, weighting the learning process of the AI in such a way that would give more importance to the authoritative sources on a subject would be a good workaround. Counting the number of backreferences of a page would be a good indicator of whether the text it contains is authoritative (just like page rank).

Considering this perspective, using large language models raises trust issues. A few technical solutions are listed above. However, it would be too reductive to consider this only as a technical problem. AI generated text then looks akin to search engines, without the comfort of knowing that they merely redirect to a source website, whose content is presumably written by a human being who tried to make it correct.

PS: This article was not written by an AI.

Friday, 13 January 2023

Use Any SOP Binary With SOP-Java and External-SOP

The Stateless OpenPGP Protocol specification describes a shared, standardized command line interface for OpenPGP applications. There is a bunch of such binaries available already, among them PGPainless’ pgpainless-cli, Sequoia-PGP’s sqop, as well as ProtonMails gosop. These tools make it easy to use OpenPGP from the command line, as well as from within bash scripts (all of those are available in Debian testing or in the main repo) and the standardized interface allows users to switch from one backend to the other without the need to rewrite their scripts.

The Java library sop-java provides a set of interface definitions that define a java API that closely mimics the command line interface. These interfaces can be implemented by anyone, such that developers could create a drop-in for sop-java using the OpenPGP library of their choice. One such backend is pgpainless-sop, which implements sop-java using the PGPainless library.

I just released another library named external-sop, which implements sop-java and allows the user to use any SOP CLI application of their choice from within their Java / Kotlin application!

Let’s assume we have a SOP command line application called example-sop and we want to make use of it within our Java application. external-sop makes the integration a one-liner:

SOP sop = new ExternalSOP("/usr/bin/example-sop");

Now we can use the resulting sop object the same way we would use for example a SOPImpl instance:

// generate key
byte[] keyBytes = sop.generateKey()
        .userId("John Doe <>")

// extract certificate
byte[] certificateBytes = sop.extractCert()

byte[] plaintext = "Hello, World!\n".getBytes(); // plaintext

// encrypt and sign a message
byte[] ciphertext = sop.encrypt()
        // encrypt for each recipient
        // Optionally: Sign the message
        .withKeyPassword("f00b4r") // if signing key is protected
        // provide the plaintext

// decrypt and verify a message
ByteArrayAndResult<DecryptionResult> bytesAndResult = sop.decrypt()
        .withKeyPassword("f00b4r") // if decryption key is protected

DecryptionResult result = bytesAndResult.getResult();
byte[] plaintext = bytesAndResult.getBytes();

The external-sop module will be available on Maven Central in a few hours for you to test.

Happy Hacking!

Wednesday, 04 January 2023

The KDE Qt5 Patch Collection has been rebased on top of Qt 5.15.8


Commercial release announcement:

OpenSource release announcement:


As usual I want to personally extend my gratitude to the Commercial users of Qt for beta testing Qt 5.15.8 for the rest of us.


The Commercial Qt 5.15.8 release introduced two bugs that have later been fixed. Thanks to that, our Patchset Collection has been able to incorporate the the fix for one of the issues [1] and revert for the other [2]  and the Free Software users will never be affected by it! 


P.S: Special shout-out to Andreas Sturmlechner for identifying the fix of the issue, since I usually only pay attention to "Revert XYZ" commits and this one is not a revert but subsequent improvement

Saturday, 31 December 2022

Phosh 2022 in retrospect

I wanted to look back at what changed in phosh in 2022 and figured I could share it with you. I'll be focusing on things very close to the mobile shell, for a broader overview see Evangelos upcoming FOSDEM talk.

Some numbers

We're usually aiming for a phosh release at the end of each month. In 2022 We did 10 releases like that, 7 major releases (bumping the middle version number) and three betas. We skipped the April and November releases. We also did one bug fix relesae out of line (bumping the last bit of the version number). I hope we can keep that cadence in 2023 as it allows us to get changes to users in a timely fashion (thus closing usability gaps as early as possible) as well as giving distributions a way to plan ahead. Ideally we'd not skip any release but sometimes real life just interferes.

Those releases contain code contributions from about 20 different people and translations from about 30 translators. These numbers are roughly the same as 2021 which is great. Thanks everyone!

In phosh's git repository we had a bit over 730 non-merge commits (roughly 2 per day), which is about 10% less than in 2021. Looking closer this is easily compensated by commits to phoc (which needed quite some work for the gestures) and phosh-mobile-settings which didn't exist in 2021.

User visible features

Most notable new features are likely the swipe gestures for top and bottom bar, the possibility to use the quick settings on the lock screen as well as the style refresh driven by Sam Hewitt that e.g. touched the modal dialogs (but also sliders, dialpads, etc):

Style Refresh Swipe up gesture

We also added the possibility to have custom widgets via loadable plugins on the lock screen so the user can decide which information should be available. We currently ship plugins to show

  • information on upcoming calendar events
  • emergency contact information
  • PDFs like bus or train tickets
  • the current month (as hello world like plugin to get started)

These are maintained within phosh's source tree although out of tree plugins should be doable too.

There's a settings application (the above mentioned phosh-mobile-settings) to enable these. It also allows those plugins to have individual preferences:

TODO A Plugin Plugin Preferenes

Speaking of configurability: Scale-to-fit settings (to work around applications that don't fit the screen) and haptic/led feedback are now configurable without resorting to the command line:

Scale to fit Feedbackd settings

We can also have device specific settings which helps to temporarily accumulate special workaround without affecting other phones.

Other user visible features include the ability to shuffle the digits on the lockscreen's keypad, a VPN quick settings, improved screenshot support and automatic high contrast theme switching when in bright sunlight (based on ambient sensor readings) as shown here.

As mentioned above Evangelos will talk at FOSDEM 2023 about the broader ecosystem improvements including GNOME, GTK, wlroots, phoc, feedbackd, ModemManager, mmsd, NetworkManager and many others without phosh wouldn't be possible.

What else

As I wanted a T-shirt for Debconf 2022 in Prizren so I created a logo heavily inspired by those cute tiger images you often see in Southeast Asia. Based on that I also made a first batch of stickers mostly distributed at FrOSCon 2022:

Phosh stickers

That's it for 2022. If you want to get involved into phosh testing, development or documentation then just drop by in the matrix room.

Friday, 23 December 2022

Donate to KDE with a 10% power up! (1-week-offer)

Hopefully by now, you know that in KDE we are running an End of Year Fundraising campaign.

If you didn't, now you know :)

The campaign has already raised around 16 thousand euros, but there's still a bit to go to the minimum goal of 20 thousand.

So let's spice things up a little, I will donate 10% of every donation you make, you donate 1€, I will donate 0.1€, you donate 100€ I will donate 10€, etc.

I'm placing my maximum total donation amount at (20000-16211.98)/11 = 344.37, that is if you all donate 3443.7€ (or more), I will donate 344.37 and we'll reach the 20K goal.

How is this going to work? 

I will make my donation on the 31st of December (just one donation, to save up on fees).

For your donation to be included in my matching donation you need to send me an email to with your name/email and put in copy (CC) the KDE e.V. board so they can confirm your donation.

Only donations between now (23rd of December around 18:00 CET) and 31st of December will be considered.

Sunday, 11 December 2022

Chronological order of The Witcher

Ever since Witcher games took off the franchise skyrocketed in popularity. Books, comics, TV show, games, more comics, another TV show… The story of Geralt and his marry company has been told in so many ways that it’s becoming a wee bit hard to keep track of chronology of all the events; especially across different forms of media.

In this article I’ve collected all official Witcher works ordering them in chronological order. To avoid any confusion, let me state up front that if you’re new to the franchise or haven’t read the books yet this list might not be for you. If you’re looking for the order to read the books in, I’ve prepared a separate article which describes that.

Skip right to the chronology


This compilation includes the following works:


Regarding Netflix show. The first season presents split timelines. Each episode can have up to three arcs for each of the main characters: Geralt, Ciri and Yennefer. Stories for each person are presented in chronological order throughout the season, however events between the timelines don’t line up. For example, the second episode shows Yennefer in the year 1206, Geralt in 1240 and Ciri in 1263.

The episodes of the main series are given using SnnEmm notation (indicating episode mm of season nn) rather than using their titles. Furthermore, episodes of the first season have name of the character in parenthesise indicating whose arc the entry refers to. For example, the second episode has three separate entries S01E02 (Yen), S01E02 (Geralt) and S01E02 (Ciri).

Dates of the events in Netflix show are taken from its official website.


It’s important to note that dates and chronology aren’t always consistent even within a single canon. A common example is Ciri’s date of birth which can be interpreted differently based on different parts of the books.

Furthermore, because of episodic nature of some of the stories, it’s not always possible to order them in relation to other events. For example, A Grain of Truth could take place pretty much at any point in Geralt’s life.

In some cases of ambiguity regarding books and CDPR timelines, I’ve resorted to looking at Witcher fandom wiki.

Lastly, dates between canons aren’t consistent. For example, Geralt and Yennefer meet in 1248 in the books but in 1256 in the Netflix show. The compilation orders entries by ‘events’ rather than by dates.

The chronology

To distinguish different forms of media (books, games, shows etc) the following icons are used next to the titles to identify what they refer to:

  • 📜 — short story / 📕 — novel / 🖌 — comic book
  • 🕹 — video game / 📱 — mobile game / 🎲 — board game or TTRPG
  • 📺 — television series / ã‚¢ — animated film / 🎥 — cinematic trailer

Clicking on canon name in the table’s header allows filtering rows by canon.

✓The Witcher: Blood Origin 📺
✓GWENT: Rogue Mage 🕹950sTakes place ‘hundreds of years before […] witchers were roaming the Continent’. There’s also accompanying (official?) Alzur’s Story.
✓The Witcher: A Witcher's Journal 🎲
✓Witcher: Old World 🎲
✓Nightmare of the Wolf ア1107–1109Beginning of the film takes place decades before Geralt’s birth.
?A Road with No Return 📜It’s arguably non-canon with the main connection being one of the main character having the same name as Geralt’s mother.
✓Droga bez powrotu 🖌Adaptation of ‘A Road with No Return’ with a slightly different title.
✓Nightmare of the Wolf ア1165By the end of the film Geralt is five years old.
✓E01: Dzieciństwo 📺Depicts Geralt at seven years old.
✓Zdrada 🖌The story happens with Geralt still training at Kaer Morhen.
✓E02: Nauka 📺Depicts Geralt’s graduation from Kaer Morhen.
✓E03: Człowiek – pierwsze spotkanie 📺Takes place immediately after Geralt’s graduation. Contains minor elements of ‘The Lesser Evil’ and ‘The Voice of Reason’.
✓The Witcher: Crimson Trail 📱The game takes place soon after Geralt finishes his training.
✓S01E02 (Yen) 📺1206
✓S01E03 (Yen) 📺1210
✓✓A Grain of Truth 📜🖌
✓✓The Lesser Evil 📜🖌
✓House of Glass 🖌Geralt speaks of lack of emotions, friends or love which makes me things this happens before The Edge of the World.
✓S01E01 (Geralt) 📺1231Based on ‘The Lesser Evil’.
?The Price of Neutrality 🕹1232Part of ‘The Witcher: Enhanced Edition’. Based on ‘The Lesser Evil’. The story is told by Dandelion which may be considered unreliable narrator.
✓The Edge of the World 📜1248Geralt and Jaskier meet for the first time.
✓S01E02 (Geralt) 📺1240Based on ‘The Edge of the World’.
✓S01E04 (Yen) 📺1240
✓S01E03 (Geralt) 📺1243Based on ‘The Witcher’.
✓S01E04 (Geralt) 📺1249Based on ‘A Question of Price’.
✓✓The Last Wish 📜🖌1248Geralt and Yennefer meet for the first time.
✓S01E05 (Geralt & Yen) 📺1256Based on ‘The Last Wish’.
✓Season of Storms 📕ca. 12501245 according to the date in the book but that’s inconsistent in relation to other books.
✓Fox Children 🖌Based on chapters 14–15 of Season of Storms.
✓A Question of Price 📜1251
✓✓The Witcher 📜🎥1252
✓Geralt 🖌1252Adaptation of ‘The Witcher’ under a different title.
✓The Voice of Reason 📜1252
?Side Effects 🕹1253Part of ‘The Witcher: Enhanced Edition’ set one year after ‘The Witcher’ short story. The story is told by Dandelion which may be considered unreliable narrator.
✓✓The Bounds of Reason 📜🖌1253
✓S01E06 (Geralt & Yen) 📺1262Based on ‘The Bounds of Reason’.
✓E04: Smok 📺Based on ‘The Bounds of Reason’.
✓A Shard of Ice 📜
✓E05: Okruch lodu 📺Based on ‘A Shard of Ice’.
✓E06: Calanthe 📺Based on ‘A Question of Price’.
✓The Eternal Fire 📜
✓E07: Dolina Kwiatów 📺Based on ‘The Eternal Fire’ and ‘The Edge of the World’.
✓A Little Sacrifice 📜
✓The Sword of Destiny 📜1262
✓Reasons of State 🖌1262Story happens just after Geralt saves Ciri from the Brokilon dryads.
✓Something More 📜1264
✓S01E07 (Geralt) 📺1263
✓S01E07 (Yen) 📺1263
✓S01E01-07 (Ciri) 📺1263Episode 4 based on ‘The Sword of Destiny’. Episode 7 based on ‘Something More’.
✓S01E08 📺1263Based on ‘Something More’. In this episode all timelines converge.
✓E08: Rozdroże 📺Based on ‘The Witcher’ with elements from ‘The Voice of Reason’ and ‘Something More’.
✓E09: Świątynia Melitele 📺Contains elements of ‘The Voice of Reason’.
✓E10: Mniejsze zło 📺Based on ‘The Lesser Evil’.
✓E11: Jaskier 📺Contains elements of ‘The Lesser Evil’.
✓E12: Falwick 📺
✓E13: Ciri 📺Contains elements of ‘Something More’.
✓S02E01 📺1263–1264Based on ‘A Grain of Truth’.
✓Blood of Elves 📕1265–1267
✓S02E02-08 📺1263–1264Based on ‘Blood of Elves’ and parts of ‘Time of Contempt’.
✓Time of Contempt 📕1267
✓Baptism of Fire 📕1267
✓The Tower of the Swallow 📕1267–1268
✓Lady of the Lake 📕1268
✓Thronebreaker: Witcher Stories 🕹1267–1268The game takes place concurrently to the saga. Chapters 1–2 happen before chapter 5 of ‘Time of Contempt’. 4 happens concurrently with 7 of ‘Baptism of Fire’. 5 happens before ‘Lady of the Lake’ and 6 happens before 9 of that book.
✗Something Ends, Something Begins 📜Non-canon even though written by Sapkowski.
✓The Witcher 🕹1270Dates as given in the games even though it is supposed to take place five years after the saga. To be aligned with the books, add three years.
✓The Witcher 2: Assassins of Kings 🕹1271
✓The Witcher Adventure Game 🕹🎲
✓The Witcher Tabletop RPG 🎲
✓Matters of Conscience 🖌1271Happens after Geralt deals with Letho. Assumes Iorveth path and dragon being spared.
✓Killing Monsters 🖌Happens as Geralt travels with Vesemir just before Witcher 3.
✓Killing Monsters 🎥
✓The Witcher 3: Wild Hunt 🕹1272
✓Hearts of Stone 🕹
✓Fading Memories 🖌This really could be place at any point in the timeline.
✓Curse of Crows 🖌Assumes Ciri becoming a witcher and romance with Yennefer.
✓Of Flesh and Flame 🖌Directly references events of Hearts of Stone.
✓Blood and Wine 🕹1275
✓A Night to Remember 🎥Shows the same character as one in ‘Blood and Wine’ expansion.citation
✓Witch’s Lament 🖌
✗The Witcher: Ronin 🖌Alternate universe inspired by Edo-period Japan.


Thanks to acbagel for pointing out Rogue Mage and Witcher's Journal.

Saturday, 10 December 2022

Running Cockpit inside ALP

  • Losca
  • 13:28, Saturday, 10 December 2022

(quoted from my other blog at since a new OS might be interesting for many and this is published in separate planets)

ALP - The Adaptable Linux Platform – is a new operating system from SUSE to run containerized and virtualized workloads. It is in early prototype phase, but the development is done completely openly so it’s easy to jump in to try it.

For this trying out, I used the latest encrypted build – as of the writing, 22.1 – from ALP images. I imported it in virt-manager as a Generic Linux 2022 image, using UEFI instead of BIOS, added a TPM device (which I’m interested in otherwise) and referring to an Ignition JSON file in the XML config in virt-manager.

The Ignition part is pretty much fully thanks to Paolo Stivanin who studied the secrets of it before me. But here it goes - and this is required for password login in Cockpit to work in addition to SSH key based login to the VM from host - first, create config.ign file:

  "ignition": { "version": "3.3.0" },
  "passwd": {
    "users": [
        "name": "root",
        "passwordHash": "YOURHASH",
        "sshAuthorizedKeys": [
          "ssh-... YOURKEY"
  "systemd": {
    "units": [{
      "name": "sshd.service",
      "enabled": true
  "storage": {
    "files": [
        "overwrite": true,
        "path": "/etc/ssh/sshd_config.d/20-enable-passwords.conf",
        "contents": {
          "source": "data:,PasswordAuthentication%20yes%0APermitRootLogin%20yes%0A"
        "mode": 420

…where password SHA512 hash can be obtained using openssl passwd -6 and the ssh key is your public ssh key.

That file is put to eg /tmp and referred in the virt-manager’s XML like follows:

  <sysinfo type="fwcfg">
    <entry name="opt/com.coreos/config" file="/tmp/config.ign"/>

Now we can boot up the VM and ssh in - or you could log in directly too but it’s easier to copy-paste commands when using ssh.

Inside the VM, we can follow the ALP documentation to install and start Cockpit:

podman container runlabel install
podman container runlabel --name cockpit-ws run
systemctl enable --now cockpit.service

Check your host’s IP address with ip -a, and open IP:9090 in your host’s browser:

Cockpit login screen

Login with root / your password and you shall get the front page:

Cockpit front page

…and many other pages where you can manage your ALP deployment via browser:

Cockpit podman page

All in all, ALP is in early phases but I’m really happy there’s up-to-date documentation provided and people can start experimenting it whenever they want. The images from the linked directory should be fairly good, and test automation with openQA has been started upon as well.

You can try out the other example workloads that are available just as well.

Sunday, 04 December 2022

URLs with // at the beginning

Just a quick reminder that relative URLs can start with a double slash and that this means something different than a single slash at the beginning. Specifically, such relative addresses are resolved by taking the schema (and only the schema) of the website they are on.

For example, the code for the link to my repositories in the site’s header is <a href="//">Code</a>. Since this page uses https schema, browsers will navigate to if the link is activated.

This little trick can save you some typing, but more importantly, if you’re developing a URL parsing code or a crawler, make sure that it handles this case correctly. It may seem like a small detail, but it can have a big impact on the functionality of your code.

Sunday, 20 November 2022

BTRFS Snapshot Cron Script

I’ve been using btrfs for a decade now (yes, than means 10y) on my setup (btw I use ArchLinux). I am using subvolumes and read-only snapshots with btrfs, but I have never created a script to automate my backups.


A few days ago, a dear friend asked me something about btrfs snapshots, and that question gave me the nudge to think about my btrfs subvolume snapshots and more specific how to automate them. A day later, I wrote a simple (I think so) script to do automate my backups.

The script as a gist

The script is online as a gist here: BTRFS: Automatic Snapshots Script . In this blog post, I’ll try to describe the requirements and what is my thinking. I waited a couple weeks so the cron (or systemd timer) script run itself and verify that everything works fine. Seems that it does (at least for now) and the behaviour is as expected. I will keep a static copy of my script in this blog post but any future changes should be done in the above gist.


The script can be improved by many,many ways (check available space before run, measure the time of running, remove sudo, check if root is running the script, verify the partitions are on btrfs, better debugging, better reporting, etc etc). These are some of the ways of improving the script, I am sure you can think a million more - feel free to sent me your proposals. If I see something I like, I will incorporate them and attribute of-course. But be reminded that I am not driven by smart code, I prefer to have clear and simple code, something that everybody can easily read and understand.

Mount Points

To be completely transparent, I encrypt all my disks (usually with a random keyfile). I use btrfs raid1 on the disks and create many subvolumes on them. Everything exists outside of my primary ssd rootfs disk. So I use a small but fast ssd for my operating system and btrfs-raid1 for my “spinning rust” disks.

BTRFS subvolumes can be mounted as normal partitions and that is exactly what I’ve done with my home and opt. I keep everything that I’ve install outside of my distribution under opt.

This setup is very flexible as I can easy replace the disks when the storage is full by removing one by one of the disks from btrfs-raid1, remove-add the new larger disk, repair-restore raid, then remove the other disk, install the second and (re)balance the entire raid1 on them!

Although this is out of scope, I use a stub archlinux UEFI kernel so I do not have grub and my entire rootfs is also encrypted and btrfs!

mount -o subvolid=10701 LABEL="ST1000DX002" /home
mount -o subvolid=10657 LABEL="ST1000DX002" /opt

Declare variables

# paths MUST end with '/'
btrfs_paths=("/" "/home/" "/opt/")
timestamp=$(date +%Y%m%d_%H%M%S)
yymmdd="$(date +%Y/%m/%d)"

The first variable in the script is actually a bash array

btrfs_paths=("/" "/home/" "/opt/")

and all three (3) paths (rootfs, home & opt) are different mount points on different encrypted disks.

MUST end with / (forward slash), either-wise something catastrophic will occur to your system. Be very careful. Please, be very careful!

Next variable is the timestamp we will use, that will create something like


After that is how many snapshots we would like to keep to our system. You can increase it to whatever you like. But be careful of the storage.


I like using shortcuts in shell scripts to reduce the long one-liners that some people think that it is alright. I dont, so

yymmdd="$(date +%Y/%m/%d)"

is one of these shortcuts !

Last, I like to have a logfile to review at a later time and see what happened.


Log Directory

for older dudes -like me- you know that you can not have all your logs under one directory but you need to structure them. The above yymmdd shortcut can help here. As I am too lazy to check if the directory already exist, I just (re)create the log directory that the script will use.

sudo mkdir -p "/var/log/btrfsSnapshot/${yymmdd}/"

For - Loop

We enter to the crucial part of the script. We are going to iterate our btrfs commands in a bash for-loop structure so we can run the same commands for all our partitions (variable: btrfs_paths)

for btrfs_path in "${btrfs_paths[@]}"; do
    <some commands>

Snapshot Directory

We need to have our snapshots in a specific location. So I chose .Snapshot/ under each partition. And I am silently (re)creating this directory -again I am lazy, someone should check if the directory/path already exist- just to be sure that the directory exist.

sudo mkdir -p "${btrfs_path}".Snapshot/

I am also using very frequently mlocate (updatedb) so to avoid having multiple (duplicates) in your index, do not forget to update updatedb.conf to exclude the snapshot directories.

PRUNENAMES = ".Snapshot"

How many snapshots are there?

Yes, how many ?

In order to learn this, we need to count them. I will try to skip every other subvolume that exist under the path and count only the read-only, snapshots under each partition.

sudo btrfs subvolume list -o -r -s "${btrfs_path}" | grep -c ".Snapshot/"

Delete Previous snapshots

At this point in the script, we are ready to delete all previous snapshots and only keep the latest or to be exact whatever the keep_snapshots variables says we should keep.

To do that, we are going to iterate via a while-loop (this is a nested loop inside the above for-loop)

while [ "${keep_snapshots}" -le "${list_btrfs_snap}" ]
  <some commands>

considering that the keep_snapshots is an integer, we iterate the delete command less or equal from the list of already btrfs existing snapshots.

Delete Command

To avoid mistakes, we delete by subvolume id and not by the name of the snapshot, under the btrfs path we listed above.

btrfs subvolume delete --subvolid "${prev_btrfs_snap}" "${btrfs_path}"

and we log the output of the command into our log

Delete subvolume (no-commit): '//.Snapshot/20221107_091028'

Create a new subvolume snapshot

And now we are going to create a new read-only snapshot under our btrfs subvolume.

btrfs subvolume snapshot -r "${btrfs_path}" "${btrfs_path}.Snapshot/${timestamp}"

the log entry will have something like:

Create a readonly snapshot of '/' in '/.Snapshot/20221111_000001'

That’s it !


Log Directory Structure and output

sudo tree /var/log/btrfsSnapshot/2022/11/

├── 07
│   └── btrfsSnapshot.log
├── 10
│   └── btrfsSnapshot.log
├── 11
│   └── btrfsSnapshot.log
└── 18
    └── btrfsSnapshot.log

4 directories, 4 files

sudo cat /var/log/btrfsSnapshot/2022/11/18/btrfsSnapshot.log

######## Fri, 18 Nov 2022 00:00:01 +0200 ########

Delete subvolume (no-commit): '//.Snapshot/20221107_091040'
Create a readonly snapshot of '/' in '/.Snapshot/20221118_000001'

Delete subvolume (no-commit): '/home//home/.Snapshot/20221107_091040'
Create a readonly snapshot of '/home/' in '/home/.Snapshot/20221118_000001'

Delete subvolume (no-commit): '/opt//opt/.Snapshot/20221107_091040'
Create a readonly snapshot of '/opt/' in '/opt/.Snapshot/20221118_000001'

Mount a read-only subvolume

As something extra for this article, I will mount a read-only subvolume, so you can see how it is done.

$ sudo btrfs subvolume list -o -r -s /

ID 462 gen 5809766 cgen 5809765 top level 5 otime 2022-11-10 18:11:20 path .Snapshot/20221110_181120
ID 463 gen 5810106 cgen 5810105 top level 5 otime 2022-11-11 00:00:01 path .Snapshot/20221111_000001
ID 464 gen 5819886 cgen 5819885 top level 5 otime 2022-11-18 00:00:01 path .Snapshot/20221118_000001

$ sudo mount -o subvolid=462 /media/
mount: /media/: can't find in /etc/fstab.

$ sudo mount -o subvolid=462 LABEL=rootfs /media/

$ df -HP /media/
Filesystem       Size  Used Avail Use% Mounted on
/dev/mapper/ssd  112G  9.1G  102G   9% /media

$ sudo touch /media/etc/ebal
touch: cannot touch '/media/etc/ebal': Read-only file system

$ sudo diff /etc/pacman.d/mirrorlist /media/etc/pacman.d/mirrorlist

< Server =$repo/os/$arch
> #Server =$repo/os/$arch

$ sudo umount /media

The Script

Last, but not least, the full script as was the date of this article.

set -e

# ebal, Mon, 07 Nov 2022 08:49:37 +0200

## 0 0 * * Fri /usr/local/bin/

# paths MUST end with '/'
btrfs_paths=("/" "/home/" "/opt/")
timestamp=$(date +%Y%m%d_%H%M%S)
yymmdd="$(date +%Y/%m/%d)"

sudo mkdir -p "/var/log/btrfsSnapshot/${yymmdd}/"

echo "######## $(date -R) ########" | sudo tee -a "${logfile}"
echo "" | sudo tee -a "${logfile}"

for btrfs_path in "${btrfs_paths[@]}"; do

    ## Create Snapshot directory
    sudo mkdir -p "${btrfs_path}".Snapshot/

    ## How many Snapshots are there ?
    list_btrfs_snap=$(sudo btrfs subvolume list -o -r -s "${btrfs_path}" | grep -c ".Snapshot/")

    ## Get oldest rootfs btrfs snapshot
    while [ "${keep_snapshots}" -le "${list_btrfs_snap}" ]
        prev_btrfs_snap=$(sudo btrfs subvolume list -o -r -s  "${btrfs_path}" | grep ".Snapshot/" | sort | head -1 | awk '{print $2}')

        ## Delete a btrfs snapshot by their subvolume id
        sudo btrfs subvolume delete --subvolid "${prev_btrfs_snap}" "${btrfs_path}" | sudo tee -a "${logfile}"

        list_btrfs_snap=$(sudo btrfs subvolume list -o -r -s "${btrfs_path}" | grep -c ".Snapshot/")

    ## Create a new read-only btrfs snapshot
    sudo btrfs subvolume snapshot -r "${btrfs_path}" "${btrfs_path}.Snapshot/${timestamp}" | sudo tee -a "${logfile}"

    echo "" | sudo tee -a "${logfile}"


Secret command Google doesn’t want you to know

Or how to change language of Google website.

If you’ve travelled abroad, you may have noticed that Google tries to be helpful and uses the language of the region you’re in on its websites. It doesn’t matter if your operating system is set to Spanish, for example; Google Search will still use Portuguese if you happen to be in Brazil.

Fortunately, there’s a simple way to force Google to use a specific language. All you need to do is append ?hl=lang to the website’s address, replacing lang with a two-letter code for the desired language. For instance, ?hl=es for Spanish, ?hl=ht for Haitian, or ?hl=uk for Ukrainian.

If the URL already contains a question mark, you need to append &hl=lang instead. Additionally, if it contains a hash symbol, you need to insert the string just before the hash symbol. For example:


By the way, as a legacy of Facebook having hired many ex-Google employees, the parameter also work on some of the Facebook properties.

This trick doesn’t work on all Google properties. However, it seems to be effective on pages that try to guess your language preference without giving you the option to override it. For example, while Gmail ignores the parameter, you can change its display language in the settings (accessible via the gear icon near the top right of the page). Similarly, YouTube strips the parameter, but it respects preferences configured in the web browser.

Anyone familiar with HTTP may wonder why Google doesn’t just look at the Accept-Language header. The issue is that many users have their browser configured with defaults that send English as the only option, even though they would prefer another language. In those cases, it’s more user-friendly to ignore that header. As it turns out, localisation is really hard.

Saturday, 19 November 2022

Baking Qemu KVM Snapshot to Base Image

When creating a new Cloud Virtual Machine the cloud provider is copying a virtual disk as the base image (we called it mí̱tra or matrix) and starts your virtual machine from another virtual disk (or volume cloud disk) that in fact is a snapshot of the base image.

baking file

Just for the sake of this example, let us say that the base cloud image is the


When creating a new Libvirt (qemu/kvm) virtual machine, you can use this base image to start your VM instead of using an iso to install ubuntu 22.04 LTS. When choosing this image, then all changes will occur to that image and if you want to spawn another virtual machine, you need to (re)download it.

So instead of doing that, the best practice is to copy this image as base and start from a snapshot aka a baking file from that image. It is best because you can always quickly revert all your changes and (re)spawn the VM from the fresh/clean base image. Or you can always create another snapshot and revert if needed.

inspect images

To see how that works here is a local example from my linux machine.

qemu-img info /var/lib/libvirt/images/lEvXLA_tf-base.qcow2

image: /var/lib/libvirt/images/lEvXLA_tf-base.qcow2
file format: qcow2
virtual size: 2.2 GiB (2361393152 bytes)
disk size: 636 MiB
cluster_size: 65536
Format specific information:
    compat: 0.10
    compression type: zlib
    refcount bits: 16

the most important attributes to inspect are

virtual size: 2.2 GiB
disk size: 636 MiB

and the volume disk of my virtual machine

qemu-img info /var/lib/libvirt/images/lEvXLA_tf-vol.qcow2

image: /var/lib/libvirt/images/lEvXLA_tf-vol.qcow2
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 1.6 GiB
cluster_size: 65536
backing file: /var/lib/libvirt/images/lEvXLA_tf-base.qcow2
backing file format: qcow2
Format specific information:
    compat: 0.10
    compression type: zlib
    refcount bits: 16

We see here

virtual size: 10 GiB
disk size: 1.6 GiB

cause I have extended the volume disk size to 10G from 2.2G , doing some updates and install some packages.

Now here is a problem.

I would like to use my own cloud image as base for some projects. It will help me speed things up and also do some common things I am usually doing in every setup.

If I copy the volume disk, then I will copy 1.6G of the snapshot disk. I can not use this as a base image. The volume disk contains only the delta from the base image!

baking file

Let’s first understand a bit better what is happening here

qemu-img info –backing-chain /var/lib/libvirt/images/lEvXLA_tf-vol.qcow2

image: /var/lib/libvirt/images/lEvXLA_tf-vol.qcow2
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 1.6 GiB
cluster_size: 65536
backing file: /var/lib/libvirt/images/lEvXLA_tf-base.qcow2
backing file format: qcow2
Format specific information:
    compat: 0.10
    compression type: zlib
    refcount bits: 16

image: /var/lib/libvirt/images/lEvXLA_tf-base.qcow2
file format: qcow2
virtual size: 2.2 GiB (2361393152 bytes)
disk size: 636 MiB
cluster_size: 65536
Format specific information:
    compat: 0.10
    compression type: zlib
    refcount bits: 16

By inspecting the volume disk, we see that this image is chained to our base image.

disk size: 1.6 GiB
disk size: 636 MiB

Commit Volume

If we want to commit our volume changes to our base images, we need to commit them.

sudo qemu-img commit /var/lib/libvirt/images/lEvXLA_tf-vol.qcow2

Image committed.

Be aware, we commit our changes the volume disk => so our base will get the updates !!

Base Image

We need to see our base image grow we our changes

  disk size: 1.6 GiB
+ disk size: 636 MiB
  disk size: 2.11 GiB

and we can verify that by getting the image info (details)

qemu-img info /var/lib/libvirt/images/lEvXLA_tf-base.qcow2

image: /var/lib/libvirt/images/lEvXLA_tf-base.qcow2
file format: qcow2
virtual size: 10 GiB (10737418240 bytes)
disk size: 2.11 GiB
cluster_size: 65536
Format specific information:
    compat: 0.10
    compression type: zlib
    refcount bits: 16

That’s it !

Friday, 11 November 2022

GitLab as a Terraform state backend

Using Terraform for personal projects, is a good way to create your lab in a reproducible manner. Wherever your lab is, either in the “cloud” aka other’s people computers or in a self-hosted environment, you can run your Infrastructure as code (IaC) instead of performing manual tasks each time.

My preferable way is to use QEMU/KVM (Kernel Virtual Machine) on my libvirt (self-hosted) lab. You can quickly build a k8s cluster or test a few virtual machines with different software, without paying extra money to cloud providers.

Terraform uses a state file to store your entire infra in json format. This file will be the source of truth for your infrastructure. Any changes you make in the code, terraform will figure out what needs to add/destroy and run only what have changed.

Working in a single repository, terraform will create a local state file on your working directory. This is fast and reliable when working alone. When working with a team (either in an opensource project/service or it is something work related) you need to share the state with others. Eitherwise the result will be catastrophic as each person will have no idea of the infrastructure state of the service.

In this blog post, I will try to explain how to use GitLab to store the terraform state into a remote repository by using the tf backend: http which is REST.

Greate a new private GitLab Project

GitLab New Project

We need the Project ID which is under the project name in the top.

Create a new api token

GitLab API

Verify that your Project has the ability to store terraform state files

GitLab State

You are ready to clone the git repository to your system.


Reading the documentation in the below links

seems that the only thing we need to do, is to expand our terraform project with this:

terraform {
  backend "http" {

Doing that, we inform our IaC that our terraform backend should be a remote address.

Took me a while to figure this out, but after re-reading all the necessary documentation materials the idea is to declare your backend on gitlab and to do this, we need to initialize the http backend.

The only Required configuration setting is the remote address and should be something like this:

terraform {
  backend "http" {
    address = "<PROJECT_ID>/terraform/state/<STATE_NAME>"

Where PROJECT_ID and STATE_NAME are relative to your project.

In this article, we go with


Terraform does not allow to use variables in the backend http, so the preferable way is to export them to our session.

and we -of course- need the address:


For convience reasons, I will create a file named: terraform.config outside of this git repo

cat > ../terraform.config <<EOF
export -p GITLAB_PROJECT_ID="40961586"
export -p GITLAB_TF_STATE_NAME="tf_state"
export -p GITLAB_URL=""

# Address


source ../terraform.config

this should do the trick.


In order to authenticate via tf against GitLab to store the tf remote state, we need to also set two additional variables:

# Authentication

put them in the above terraform.config file.

Pretty much we are done!

Initialize Terraform

source ../terraform.config 

terraform init
Initializing the backend...

Successfully configured the backend "http"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...
- Finding latest version of hashicorp/http...
- Finding latest version of hashicorp/random...
- Finding latest version of hashicorp/template...
- Finding dmacvicar/libvirt versions matching ">= 0.7.0"...
- Installing hashicorp/random v3.4.3...
- Installed hashicorp/random v3.4.3 (signed by HashiCorp)
- Installing hashicorp/template v2.2.0...
- Installed hashicorp/template v2.2.0 (signed by HashiCorp)
- Installing dmacvicar/libvirt v0.7.0...
- Installed dmacvicar/libvirt v0.7.0 (unauthenticated)
- Installing hashicorp/http v3.2.1...
- Installed hashicorp/http v3.2.1 (signed by HashiCorp)

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.


Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

Remote state

by running

terraform plan

we can now see the remote terraform state in the gitlab

GitLab TF State

Opening Actions –> Copy terraform init command we can see the below configuration:


terraform init

Update terraform backend configuration

I dislike running a “long” terraform init command, so we will put these settings to our tf code.

Separating the static changes from the dynamic, our Backend http config can become something like this:

terraform {
  backend "http" {
    lock_method    = "POST"
    unlock_method  = "DELETE"
    retry_wait_min = 5

but we need to update our terraform.config once more, to include all the variables of the http backend configuration for locking and unlocking the state.

# Lock

# Unlock

Terraform Config

So here is our entire terraform config file

# GitLab

export -p GITLAB_URL=""
export -p GITLAB_PROJECT_ID="<>"
export -p GITLAB_TF_STATE_NAME="tf_state"

# Terraform

# Address

# Lock

# Unlock

# Authentication
export -p TF_HTTP_USERNAME="api"
export -p TF_HTTP_PASSWORD="<>"

And pretty much that’s it!

Other Colleagues

So in order our team mates/colleagues want to make changes to this specific gitlab repo (or even extended to include a pipeline) they need

  1. Git clone the repo
  2. Edit the terraform.config
  3. Initialize terraform (terraform init)

And terraform will use the remote state file.

Tag(s): gitlab, terraform

Saturday, 05 November 2022

KDE Gear 22.12 branches created

Make sure you commit anything you want to end up in the KDE Gear 22.12 releases to them

We're already past the dependency freeze.

The Feature Freeze and Beta is next week Thursday 10 of November.

More interesting dates
 November 24, 2022: 22.12 RC (22.11.90) Tagging and Release
 December 1, 2022: 22.12 Tagging
 December 8, 2022: 22.12 Release

Friday, 28 October 2022

The KDE Qt5 Patch Collection has been rebased on top of Qt 5.15.7



Commercial release announcement: 

OpenSource release announcement:


As usual I want to personally extend my gratitude to the Commercial users of Qt for beta testing Qt 5.15.7 for the rest of us.


The Commercial Qt 5.15.7 release introduced one bug that has later been fixed. Thanks to that, our Patchset Collection has been able to incorporate the revert for bug [1]  and the Free Software users will never be affected by it!

Planet FSFE (en): RSS 2.0 | Atom | FOAF |

                  Albrechts Blog  Alessandro's blog  Andrea Scarpino's blog  André Ockers on Free Software  Bela's Internship Blog  Bernhard's Blog  Bits from the Basement  Blog of Martin Husovec  Bobulate  Brian Gough’s Notes  Chris Woolfrey — FSFE UK Team Member  Ciarán’s free software notes  Colors of Noise - Entries tagged planetfsfe  Communicating freely  Daniel Martí's blog  David Boddie - Updates (Full Articles)  ENOWITTYNAME  English Planet – Dreierlei  English – Alessandro at FSFE  English – Alina Mierlus – Building the Freedom  English – Being Fellow #952 of FSFE  English – Blog  English – FSFE supporters Vienna  English – Free Software for Privacy and Education  English – Free speech is better than free beer  English – Jelle Hermsen  English – Nicolas Jean's FSFE blog  English – Paul Boddie's Free Software-related blog  English – The Girl Who Wasn't There  English – Thinking out loud  English – Viktor's notes  English – With/in the FSFE  English – gollo's blog  English – mkesper's blog  English – nico.rikken’s blog  Escape to freedom  Evaggelos Balaskas - System Engineer  FSFE interviews its Fellows  FSFE – Frederik Gladhorn (fregl)  FSFE – Matej's blog  Fellowship News  Free Software & Digital Rights Noosphere  Free Software with a Female touch  Free Software –  Free Software – hesa's Weblog  Free as LIBRE  Free, Easy and Others  FreeSoftware – egnun's blog  From Out There  Giacomo Poderi  Green Eggs and Ham  Handhelds, Linux and Heroes  HennR’s FSFE blog  Henri Bergius  In English —  Inductive Bias  Karsten on Free Software  Losca  MHO  Mario Fux  Matthias Kirschner's Web log - fsfe  Max Mehl (English)  Michael Clemens  Myriam's blog  Mäh?  Nice blog  Nikos Roussos - opensource  Posts - Carmen Bianca Bakker  Posts on Hannes Hauswedell  Pressreview  Rekado  Riccardo (ruphy) Iaconelli – blog  Saint’s Log  TSDgeos' blog  Tarin Gamberini  Technology – Intuitionistically Uncertain  The trunk  Thomas Løcke Being Incoherent  Told to blog - Entries tagged fsfe  Tonnerre Lombard  Vincent Lequertier's blog  Vitaly Repin. Software engineer's blog  Weblog  Weblog  Weblog  Weblog  Weblog  Weblog  a fellowship ahead  agger's Free Software blog  anna.morris's blog  ayers's blog  bb's blog  blog  en – Florian Snows Blog  en – PB's blog  en – rieper|blog  english on Björn Schießle - I came for the code but stayed for the freedom  english – Torsten's FSFE blog  foss – vanitasvitae's blog  free software blog  freedom bits  freesoftware – drdanzs blog  fsfe – Thib's Fellowship Blog  julia.e.klein’s blog  marc0s on Free Software  pichel’s blog  planet-en – /var/log/fsfe/flx  polina's blog  softmetz' anglophone Free Software blog  stargrave's blog  tobias_platen's blog  tolld's blog  wkossen’s blog  yahuxo’s blog