Wednesday, 27 September 2023
Am I using qmllint wrong? Or is it still not there?
Today I was doing some experiments with qmllint hoping it would help us make QML code more robust.
I created a very simple test which is basically a single QML file that creates an instance of an object I've created from C++.
But when running qmllint via the all_qmllint target it tells me
Warning: Main.qml:14:9: No type found for property "model". This may be due to a missing import statement or incomplete qmltypes files. [missing-type] model: null ^^^^^ Warning: Main.qml:14:16: Cannot assign literal of type null to QAbstractItemModel [incompatible-type] model: null ^^^^
Which is a relatively confusing error, since it first says that it doesn't know what the model property is, but then says "the model property is an QAbstractItemModel and you can't assign null to it"
Here the full code https://bugreports.qt.io/secure/attachment/146411/untitled1.zip in case you want to fully reproduce but first some samples of what i think it's important
QML FILE
import QtQuick import QtQuick.Window import untitled1 // This is the name of my import Window { // things
ObjectWithModel { model: null } }
HEADER FILE (there's nothing interesting in the cpp file)
#pragma once #include <QtQmlIntegration> #include <QAbstractItemModel> #include <QObject> class ObjectWithModel : public QObject { Q_OBJECT QML_ELEMENT Q_PROPERTY(QAbstractItemModel* model READ model WRITE setModel NOTIFY modelChanged) public: explicit ObjectWithModel(QObject* parent = nullptr); AbstractItemModel* model() const; void setModel(QAbstractItemModel* model); signals: void modelChanged(); private: QAbstractItemModel* mModel = nullptr; };
CMAKE FILE
cmake_minimum_required(VERSION 3.16) project(untitled1 VERSION 0.1 LANGUAGES CXX) set(CMAKE_CXX_STANDARD_REQUIRED ON) find_package(Qt6 6.4 REQUIRED COMPONENTS Quick) qt_standard_project_setup() qt_add_executable(appuntitled1 main.cpp) qt_add_qml_module(appuntitled1 URI untitled1 VERSION 1.0 QML_FILES Main.qml SOURCES ObjectWithModel.h ObjectWithModel.cpp ) target_link_libraries(appuntitled1 PRIVATE Qt6::Quick)
As you can see it's quite simple and, as far as I know, using the recommended way of setting up a QML module when using a standalone app.
But maybe I am holding it wrong?
Friday, 22 September 2023
Seafile Mirror - Simple automatic backup of your Seafile libraries
I have been using Seafile for years to host and synchronise files on my own server. It’s fast and reliable, especially when dealing with a large number and size of files. But making reliable backups of all its files isn’t so trivial. This is because the files are stored in a layout similar to bare Git repositories, and Seafile’s headless tool, seafile-cli, is… suboptimal. So I created what started out as a wrapper for it and ended up as a full-blown tool for automatically synchronising your libraries to a backup location: Seafile Mirror.
My requirements
Of course, you could just take snapshots of the whole server, or copy the raw Seafile data files and import them into a newly created Seafile instance as a disaster recovery, but I want to be able to directly access the current state of the files whenever I need them in case of an emergency.
It was also important for me to have a snapshot, not just another real-time sync of a library. This is because I also want to have a backup in case I (or an attacker) mess up a Seafile library. A real-time sync would immediately fetch that failed state.
I also want to take a snapshot at a configurable interval. Some libraries should be synchronised more often than others. For example, my picture albums do not change as often as my miscellaneous documents, but they use at least 20 times the disk space and therefore network traffic when running a full sync.
Also, the backup service must have read-only access to the files.
A version controlled backup of the backup (i.e. the plain files) wasn’t in scope. I handle this separately by backing up my backup location, which also contains similar backups of other services and machines. For this reason, my current solution does not do incremental backups, even though this may be relevant for other use cases.
The problems
Actually, seafile-cli should have been everything you’d need to fulfill the requirements. But no. It turned out that this tool has a number of fundamental issues:
- You can make the host the tool is running on a sync peer. However, it easily leads to sync errors if the user just has read-only permissions to the library.
- You can also download a library but then again it may lead to strange sync errors.
- It requires a running daemon which crashes irregularly during larger sync tasks or has other issues.
- Download/sync intervals cannot be set manually.
The solution
seafile-mirror takes care of all these stumbling blocks:
- It downloads/syncs defined libraries in customisable intervals
- It de-syncs libaries immediately after they have been downloaded to avoid sync errors
- You can force-re-sync a library even if its re-sync interval hasn’t reached yet
- Extensive informative and error logging is provided
- Of course created with automation in mind so you can run it in cronjobs or systemd triggers
- And as explained, it deals with the numerous caveats of
seaf-cli
and Seafile in general
Full installation and usage documentation can be found in the project
repository. Installation is as simple as running pip3 install seafile-mirror
,
and a sample configuration is provided.
In my setup, I run this application on a headless server with systemd under a separate user account. Therefore the systemd service needs to be set up first. This is also covered in the tool’s documentation. And as an Ansible power user, I also provide an Ansible role that does all the setup and configuration.
Possible next steps
The tool has been running every day since a couple of months without any issues. However, I could imagine a few more features to be helpful for more people:
- Support of login tokens: Currently, only user/password auth is supported which is fine for my use-case as it’s just a read-only user. This wouldn’t be hard to fix either, seafile-cli supports it (at least in theory). (#2)
- Support of encrypted libraries: Shouldn’t be a big issue, it would require passing the password to the underlying seafile-cli command. (#3)
If you have encountered problems or would like to point out the need for specific features, please feel free to contact me or comment on the Mastodon post. I’d also love to hear if you’ve become a happy user of the tool 😊.
Wednesday, 13 September 2023
Importance of more inclusive illustrations
Recently I received an e-mail with pictures which touched me, and which showed me how important it is to think about diversity when creating illustrations.
The photos were taken in a school at a hospital run by an international medical organisation that operates in the Middle East and showed children reading the Arabic translation of Ada & Zangemann - A Tale of Software, Skateboards, and Raspberry Ice Cream.
The hospital does surgery for "war victims, mostly people who have lost a limb (often because of a landmine) or suffered burns (usually because of bombings)." The pictures showed children from surrounding countries (Yemen, Syria, Irak mostly) who because of their condition usually have to stay at hospital away from their country for several months, often years.
"So while I can't guarantee that thousands of kids will read those copies of the book, I can promise that they do make a huge difference for the kids who do. Most of them have a 3d printed arm or leg, or a compression mask to help with burn healing. I suspect that the concept of being able to tinker with software and tools around them will ring a bell (the prosthetics you see in the video above are all 3D printed on site by [the organisation])." (The quotes are from the e-mail I received.)
For the book, Wiebke (editor), Sandra (illustrator) and I spent significant time to discuss the inclusiveness of the characters. Sandra's experience with inclusiveness was one of the reasons why I approached her: to see if she would like to work with us on the book: considering inclusiveness without distracting the reader from the main story. Receiving this e-mail, and looking at the pictures showed me again that every minute we spent on thinking about inclusiveness was worth it.
A lot of people will not realise it when they read the book and look at the illustrations, but taking a closer look, you will realise that one of the characters in the book is using a 3d printed leg. For readers with physical impairments, this tiny detail can make a huge difference.
Sunday, 03 September 2023
FSFE information stall on Veganmania Donauinsel 2023



On the second Veganmania street festival this year taking place on the Danube island in Vienna from August 25. to 27. we finally managed to borrow a sturdy tent. We could get it for free from a local animal rights organisation. This was great for withstanding the high temperatures during the event because it provided urgently needed shade. The only downside was that the name of the well known organisation was printed onto the tent. This caused many people to mistake our information stall for one of this other organisation despite none of our banners, posters and information indicated any relation to this subject or organisation – at least on first glance. Of course this didn’t hinder us to clarify the confusion and to point out the most important subject on our desk: independence on personal electronic devices.
As usual many people did use this opportunity to learn more about free software and the advantages it brings. Other than that we had many encounters with people who already use free software and were as happy as surprised to find us at this event. Of course we could easily give the reasoning why we feel that free software is a perfect addition to a vegan lifestyle. After all, most people decide to go vegan because they don’t want to harm others. And if you apply the same thought to the world of software you end up with free software.
Again I need to order more information material for the next instalment of our information desk on 8. and 9. October this year at the third Veganmania summer festival in Vienna in front of the city hall. Usually there are only two Veganmanias each year but since it is the 25. anniversary of the event in 2023 a third one will take place at this prestigious and hard to get location.
We noticed an interesting re-occurring phenomenon concerning a difference in how male and female people approach our information desk. Of course this is just a tendency and there are exceptions but in general most men will only approach our desk because they already know about free software and they want to check out what material we do offer. And female visitors of our desk mostly aren’t familiar yet with free software but are willing to find out what it is about.
Many people were especially interested in ways to improve their privacy and independence on their mobile phones. Unfortunately many of those did use iOS devices and we couldn’t offer them any solutions on this totally locked down platform. Android is far from being ideal but it at least gives most users the opportunity to go for more privacy focused solutions. Even if they didn’t want to forego all proprietary software they can at least use F-Droid as an app store to add free software apps to their mix. And it is of course always good to know that you can actually upcycle your mobile after the original OS stopped providing security updates by installing a free alternative Android system like LineageOS.
Especially the brochure for decision makers in the public sector investigating what advantages free software brings to the table in this area is still in higher demand than I anticipated. I really need to order more of those.
A large amount of different stickers seems to attract many people. I need to replace some of my over the years rather worn out posters. And I am still not certain if I should actually invest in my own tent because one that can withstand wind, rain and many years of service isn’t cheap. But using a tent with the information of an other organisation printed on it hasn’t proven to be ideal for the confusion it creates.
I also consider joining the annual Volksstimmefest with our FSFE information stall, but I am not convinced how good this idea is because it seems to be more focused on concerts and has a clear tendency to be a left wing political event. Since I don’t consider free software to be a predominantly left wing subject I am somewhat reluctant to position it so clearly in this spectrum.
Manning the desk for three days was somewhat exhausting since my usual helper couldn’t be there due to a clash of appointments. Nevertheless, I consider the information desk on Veganmania 2023 on the Danube island as an other successful event where I was able to inform many people about ways to improve their independency in the digital realm by employing free software.
Tuesday, 25 July 2023
PGPainless meets the Web-of-Trust
We are very proud to announce the release of PGPainless-WOT, an implementation of the OpenPGP Web of Trust specification using PGPainless.
The release is available on the Maven Central repository.
The work on this project begun a bit over a year ago as an NLnet project which received funding through the European Commission’s NGI Assure program. Unfortunately, somewhere along the way I lost motivation to work on the project, as I failed to see any concrete users. Other projects seemed more exciting at the time.
Fast forward to end of May when Wiktor reached out and connected me with Heiko, who was interested in the project. We two decided to work together on the project and I quickly rebased my – at this point ancient and outdated – feature branch onto the latest PGPainless release. At the end of June, we started the joint work and roughly a month later today, we can release a first version
Big thanks to Heiko for his valuable contributions and the great boost in motivation working together gave me
Also big thanks to NLnet for sponsoring this project in such a flexible way.
Lastly, thanks to Wiktor for his talent to connect people
The Implementation
We decided to write the implementation in Kotlin. I had attempted to learn Kotlin multiple times before, but had quickly given up each time without an actual project to work on. This time I stayed persistent and now I’m a convinced Kotlin fan Rewriting the existing codebase was a breeze and the line count drastically reduced while the amount of syntactic sugar which was suddenly available blow me away! Now I’m considering to steadily port PGPainless to Kotlin. But back to the Web-of-Trust.
Our implementation is split into 4 modules:
pgpainless-wot
parses OpenPGP certificates into a generalized form and builds a flow network by verifying third-party signatures. It also provides a plugin forpgpainless-core
.wot-dijkstra
implements a query algorithm that finds paths on a network. This module has no OpenPGP dependencies whatsoever, so it could also be used for other protocols with similar requirements.pgpainless-wot-cli
provides a CLI frontend forpgpainless-wot
wot-test-suite
contains test vectors from Sequoia PGP’s WoT implementation
The code in pgpainless-wot
can either be used standalone via a neat little API, or it can be used as a plugin for pgpainless-core
to enhance the encryption / verification API:
/* Standalone */ Network network = PGPNetworkParser(store).buildNetwork(); WebOfTrustAPI api = new WebOfTrustAPI(network, trustRoots, false, false, 120, refTime); // Authenticate a binding assertTrue( api.authenticate(fingerprint, userId, isEmail).isAcceptable()); // Identify users of a certificate via the fingerprint assertEquals( "Alice <alice@example.org>", api.identify(fingerprint).get(0).getUserId()); // Lookup certificates of users via userId LookupAPI.Result result = api.lookup( "Alice <alice@example.org>", isEmail); // Identify all authentic bindings (all trustworthy certificates) ListAPI.Result result = api.list(); /* Or enhancing the PGPainless API */ CertificateAuthorityImpl wot = CertificateAuthorityImpl .webOfTrustFromCertificateStore(store, trustRoots, refTime) // Encryption EncryptionStream encStream = PGPainless.encryptAndOrSign() [...] // Add only recipients we can authenticate .addAuthenticatableRecipients(userId, isEmail, wot) [...] // Verification DecryptionStream decStream = [...] [...] // finish decryption MessageMetadata metadata = decStream.getMetadata(); assertTrue(metadata.isAuthenticatablySignedBy(userId, isEmail, wot));
The CLI application pgpainless-wot-cli
mimics Sequoia PGP’s neat sq-wot tool, both in argument signature and output format. This has been done in an attempt to enable testing of both applications using the same test suite.
pgpainless-wot-cli
can read GnuPGs keyring, can fetch certificates from the Shared OpenPGP Certificate Directory (using pgpainless-cert-d of course :P) and ingest arbitrary .pgp keyring files.
$ ./pgpainless-wot-cli help Usage: pgpainless-wot [--certification-network] [--gossip] [--gpg-ownertrust] [--time=TIMESTAMP] [--known-notation=NOTATION NAME]... [-r=FINGERPRINT]... [-a=AMOUNT | --partial | --full | --double] (-k=FILE [-k=FILE]... | --cert-d[=PATH] | --gpg) [COMMAND] -a, --trust-amount=AMOUNT The required amount of trust. --cert-d[=PATH] Specify a pgp-cert-d base directory. Leave empty to fallback to the default pgp-cert-d location. --certification-network Treat the web of trust as a certification network instead of an authentication network. --double Equivalent to -a 240. --full Equivalent to -a 120. --gossip Find arbitrary paths by treating all certificates as trust-roots with zero trust. --gpg Read trust roots and keyring from GnuPG. --gpg-ownertrust Read trust-roots from GnuPGs ownertrust. -k, --keyring=FILE Specify a keyring file. --known-notation=NOTATION NAME Add a notation to the list of known notations. --partial Equivalent to -a 40. -r, --trust-root=FINGERPRINT One or more certificates to use as trust-roots. --time=TIMESTAMP Reference time. Commands: authenticate Authenticate the binding between a certificate and user ID. identify Identify a certificate via its fingerprint by determining the authenticity of its user IDs. list Find all bindings that can be authenticated for all certificates. lookup Lookup authentic certificates by finding bindings for a given user ID. path Verify and lint a path. help Displays help information about the specified command
The README file of the pgpainless-wot-cli
module contains instructions on how to build the executable.
Future Improvements
The current implementation still has potential for improvements and optimizations. For one, the Network object containing the result of many costly signature verifications is currently ephemeral and cannot be cached. In the future it would be desirable to change the network parsing code to be agnostic of reference time, including any verifiable signatures as edges of the network, even if those signatures are not yet – or no longer valid. This would allow us to implement some caching logic that could write out the network to disk, ready for future web of trust operations.
That way, the network would only need to be re-created whenever the underlying certificate store is updated with new or changed certificates (which could also be optimized to only update relevant parts of the network). The query algorithm would need to filter out any inactive edges with each query, depending on the queries reference time. This would be far more efficient than re-creating the network with each application start.
But why the Web of Trust?
End-to-end encryption suffers from one major challenge: When sending a message to another user, how do you know that you are using the correct key? How can you prevent an active attacker from handing you fake recipient keys, impersonating your peer? Such a scenario is called Machine-in-the-Middle (MitM) attack.
On the web, the most common countermeasure against MitM attacks are certificate authorities, which certify the TLS certificates of website owners, requiring them to first prove their identity to some extent. Let’s Encrypt for example first verifies, that you control the machine that serves a domain before issuing a certificate for it. Browsers trust Let’s Encrypt, so users can now authenticate your website by validating the certificate chain from the Let’s Encrypt CA key down to your website’s certificate.
The Web-of-Trust follows a similar model, with the difference, that you are your own trust-root and decide, which CA’s you want to trust (which in some sense makes you your own “meta-CA”). The Web-of-Trust is therefore far more decentralized than the fixed set of TLS trust-roots baked into web browsers. You can use your own key to issue trust signatures on keys of contacts that you know are authentic. For example, you might have met Bob in person and he handed you a business card containing his key’s fingerprint. Or you helped a friend set up their encrypted communications and in the process you two exchanged fingerprints manually.
In all these cases, in order to initiate a secure communication channel, you needed to exchange the fingerprint via an out-of-band channel. The real magic only happens, once you take into consideration that your close contacts could also do the same for their close contacts, which makes them CAs too. This way, you could authenticate Charlie via your friend Bob, of whom you know that he is trustworthy, because – come on, it’s Bob! Everybody loves Bob!

The Web-of-Trust becomes really useful if you work with people that share the same goal. Your workplace might be one of them, your favorite Linux distribution’s maintainer team, or that non-Profit organization/activist collective that is fighting for a better tomorrow. At work for example, your employer’s IT department might use a local CA (such as an instance of the OpenPGP CA) to help employees to communicate safely. You trust your workplace’s CA, which then introduces you safely to your colleagues’ authentic key material. It even works across business’ boundaries, e.g. if your workplace has a cooperation with ACME and you need to establish a safe communication channel to an ACME employee. In this scenario, your company’s CA might delegate to the ACME CA, allowing you to authenticate ACME employees.
As you can see, the Web-of-Trust becomes more useful the more people are using it. Providing accessible tooling is therefore essential to improve the overall ecosystem. In the future, I hope that OpenPGP clients such as MUAs (e.g. Thunderbird) will embrace the Web-of-Trust.
Monday, 17 July 2023
KDE Gear 23.08 branches created
Make sure you commit anything you want to end up in the KDE Gear 23.08 releases to them
Dependency freeze is next July 20The Feature Freeze and Beta is Thursday 27 of July.
More interesting dates
August 10: 23.08 RC (23.07.90) Tagging and Release
August 17: 23.08 Tagging
August 24: 23.08 Release
https://community.kde.org/Schedules/KDE_Gear_23.08_Schedule
Wednesday, 12 July 2023
A Truly Free AI
Understanding what makes a software Free (as in freedom) has been going on since the beginning of the Free Software movement in the 80’s (at least). This led to the Free Software licenses, which help users to control the technology they use. However, considering the peculiarities of Artificial Intelligence (AI) software, one may wonder whether those licenses account for those.
Free Software licenses were designed so that users control technology, and facilitate their collaboration. Software released under a Free Software license guarantees that users can use, study, share and improve it however they want, with anybody they want. Once one accesses the source code and the accompanying license(s), he or she can run the software. Indeed, most software runs on commodity hardware. However, this is not true for AI and deep learning, the branch of AI powering most of the recent successful AI technologies.
In Artificial Intelligence, deep learning is a part of machine learning and is usually composed of 5 elements: data, a model and its parameters, the definition of a problem (in the form of a loss function) which ties the data and the model together, a training phase and an inference phase. The goal of the learning phase (training) is to modify the model’s parameters so that the model gets incrementally better at solving the problem i.e. at minimizing the loss function. Once the loss stops decreasing, the model cannot learn further and the parameters stop changing. Using those parameters, one can make predictions with data not used during the learning phase: this is the inference phase. In deep learning, those parameters are the weights of interconnected neurons which form an artificial neural network.
But here is the problem: the number of parameters used for deep learning is enormous and keeps increasing. Likewise, the amount of data is getting enormous, to a point where using deep learning on commodity hardware is no longer possible. This raises the question of what would make an AI truly Free: what is the point of an AI published as Free Software if most users cannot exercise the 4 freedoms endowed by existing Free Software definition and licenses? Even though one might access the data and the code used for training, they would not be able to train the AI, improve it and share the results. Those who can afford to train the AI (modify the weights of deep learning models) are in a very powerful position compared to those who cannot. The AI being Free Software therefore does not necessarily guarantee that users stay in control of technology. What would be required to make an AI Free Software in the sense that it allows users to control it?
A truly Free AI would need to be easy to train by their users. This requires the trained model’s parameters to be easily accessible so that they can be used as a starting point for training, rather than adjusting the parameters from scratch (usually from randomly initialized parameters). Deep learning weights should thus be Free Software. The number of parameters and the amount of data required to improve the AI software would also need to be manageable. If the data and its precise description cannot be shared, the use of Open Standard would facilitate the creation of alternative datasets.
AI is not going away. Since the rise of deep learning in the last decade, triggered by the availability of more data, improved methods for stabilizing and speeding up the training of deep neural networks and improved hardware, the use of AI is becoming more and more mainstream. And now that we start to understand how powerful the AI genie is, it cannot be put back in the bottle. This raises the question of how to stay in control of technology in a world where AI is bound to become more powerful and ubiquitous. Free Software is a key part of the answer.
Thursday, 06 July 2023
Creating an OpenPGP Web-of-Trust Implementation – Knitting a Net
This post is part of a series. Read the last part here.
There are two obvious operations your OpenPGP implementation needs to be capable of performing if you want to build a Web-of-Trust. First you need to be able to sign other users public keys (certificates), and second, you need to be able to verify those certifications.
The first is certainly the easier of the two tasks. In order to sign another users certificate, you simply take your own secret key, decide which binding (a tuple of a certificate and a user-id) you want to create a certification for and then simply generate a signature over the user-id and the primary key of the certificate according to the standard.
Now your signature can be distributed, e.g. by mailing it to the user, or by publishing it to a key server. This task is simple, because all the ingredients are well known. You know which key to use to create the signature, you know which user-id you want to certify and you know which key to bind the user-id to. So signing a certificate is more or less straight forward application of the cryptography defined in the specification.
But the task of verifying, whether there is a valid signature by one certificate over another is far more complex of a task. Here, the specification is deliberately vague. Some time ago I wrote an article, describing why signature verification in OpenPGP is hard, and I still stand to my points.
Authentication of a certificate is the task of figuring out how confident you can be that a key that claims to belong to “Alice <alice@example.org>” really was issued by Alice and not by an imposter, in other words you need to proof the authenticity of the binding. To accomplish this task, the Web-of-Trust is scanned for paths that lead from a root of trust (think e.g. a CA or your own certificate) to the binding in question.
Building a Web-of-Trust implementation can be divided in a number of steps which luckily for us stand independent from another:
- Ingest the set of certificates
- Verify certifications made on those certificates
- Build a flow network from the certificates and certifications
- Perform queries on the network to find paths from or to a certain binding (e.g. using Dijkstra’s algorithm)
- Interpret the resulting path(s) to infer authenticity of said binding
In the first step we simply want to create an index of all available certificates, such that in later steps we are able to have random access on any certificate via its fingerprint(s) or key-ID(s).
The second step is to go through each certificate one-by-one and attempt to verify third-party certifications made over its primary key or user-ids. Here, the index built in the previous step comes in handy acquire the issuer certificate to perform the signature verification. In this step, we index the certifications we made and keep them available for the next step. Once we successfully performed all signature verifications, the OpenPGP-portion of the task is done.
In step three, we form a flow network from the results of the previous steps. Certificates themselves form the nodes of the network, while each signature represents an edge between the issuer- and target certificate. There can be more than one edge between two certificates.
Step 4 and 5 are the technically most complicated steps, so I will not go into too much detail in this post. For now, I will instead first try to explain the abstract picture of the Web-of-Trust I have in my head:
I imagine the Web-of-Trust as an old, half-rotten fishing net (bear with me); There are knobbly knots, which may or may not be connected to neighboring knots through yarn of different thickness. Some knots are well-connected with others, as ye olde fisherman did some repair work on the net, while other knots or even whole sections of the net have no intact connections left to the rest. Many connections rotted away as the yarn past its expiration date.
When we now attempt to pick up the net by lifting one of the knots into the air, all those knots that are connected either directly or indirectly will also lift up, while disconnected knots or sections will fall to the ground. If we put some weight on the net, some of the brittle connections may even break, depending on how much weight we apply. Others might hold because a knot has multiple connections that share the load.
In this analogy, each knot is a node (OpenPGP certificate), with yarn connections being the certifications. Different thickness of the yarn means different trust-amounts. The knot(s) we chose to pick the net up from are the trust roots. Each knot that lifts up from the ground we can authenticate. The weight we apply to the net can be seen as the amount of trust we require. If we aren’t able to accumulate enough trust-amount for a path, the knot rips off the fishing net, meaning it cannot be authenticated to a sufficient degree.
This analogy is of course not perfect at all. First off, edges in the Web-of-Trust are directed, meaning you can follow the edge in one direction, but not necessarily in the other. Furthermore, the Web-of-Trust has some more advanced attributes that can be put into a certification to give it even more meaning. For example, a trust signature not only has a numeric trust-amount, but also a depth, which limits the number of “hops” you can make after passing over the edge. Certifications can also include regular expressions, limiting to which certificates you can hop next.
Still, to get an initial, rough understanding of the WoT, I find the fishing net analogy quite suitable. In a later post I might go into more detail on steps 4 and 5.
Tuesday, 04 July 2023
Nextcloud and OpenID-Connect

%!s()
This is a updated version of a old blog post from 2020. The guide here was tested with Nextcloud Hub 5 and Keycloak 21.1.2.
Please keep in mind, the main goal of this article is to get Keycloak up and running quickly to test the Nextcloud OIDC connector. It is not a detailed guide how to setup Keycloak for production! It is quite likely that I missed some important security setting which you would like to enable for a live system.
Get a OpenID Connect provider
First step was to get an OpenID-Connect provider, sure I could have chosen one of the public services. But why not have a small nice provider running directly on my machine? Keycloak makes this really simple. By following their Getting Started Guide I could setup a OpenID-Connect provider in just a few minutes and run it directly on my local demo machine. I will show you how I configured Keycloak as an OpenID-Connect provider for Nextcloud.
After installing Keycloak we go to http://localhost:8080/admin
which is the default URL in “standalone” mode and login as admin. The first thing we do is to configure a new Realm in the “Realm Settings”. We only set a Realm name, no “resource file” needed:

Next we move on to the “Clients” tab, and created a new client. In the dialog we first set a random “Client ID”, I have chosen “nextcloud” in this example.

On the second page we enable “Client authentication”

and on the last page we set the “Root URL” to the Nextcloud, in this case “http://nextcloud.local”.

All other settings are already set correctly, this is now the final client setting looks like:

Finally we create a user who should be able to login to Nextcloud later.

While technically the “Username” is enough I directly set email address, first- and second name. Nextcloud will reuse this information later to pre-fill the users profile nicely. Don’t forget to go to the “Credentials” tab and set a password for your new user.
That’s it, now we just need to get the client secret fot the final Nextcloud configuration. We can find this in the credential tab of the “nextcloud” client settings:

Nextcloud setup
Before we continue, make sure to have the following two lines in your config.php:
'allow_local_remote_servers' => true,
'debug' => true,
Otherwise Nextcloud will refuse to connect to Keycloak on localhost through a unsecure connection (http).
Now we can move on and configure Nextcloud. If you go to the Nextcloud apps management and search for “openid” you will not only find the official app but also the community apps. Make sure to chose the app called “OpenID Connect user backend”. Just to avoid misunderstandings at this point, the Nextcloud community does an awesome job! I’m sure the community apps work great too, they may have even more features compared to the official app. But the goal of this article was to try out the officially supported OpenID-Connect app.
After installing the app we go to the admin settings where we will find a new menu entry called “OpenID Connect” on the left sidebar and register a new provider:

The app supports multiple OpenID Connect providers in parallel, so the first thing we do is to chose a “Identifier” which will be shown on the login page to let the user chose the right provider. For the other fields we enter the “Client ID”, “Client secret” and “Discovery endpoint” from Keycloak. The discovery endpoint looks like the following: http://<your-keycloak-base-url>/realms/<REALM-NAME>/.well-known/openid-configuration
. As you can see there are many more configuration parameters, for example a custom attribute mapping. But for the base installation we are complete.
Now let’s try to login with OpenID Connect:

As you can see, we have now an additional button called “Login with Keycloak”. Once clicked we get redirected to Keycloak:

After we successfully logged-in to Keycloak we get directly redirected back to Nextcloud and are logged-in. A look into our personal settings shows us that all our account detail like the full name and the email address where added correctly to our Nextcloud account:

Sunday, 25 June 2023
KDE Gear 23.08 release schedule
This is the release schedule the release team agreed on
https://community.kde.org/Schedules/KDE_Gear_23.08_Schedule
Dependency freeze is in less than 4 weeks (July 20) and feature freeze one
after that. Get your stuff ready!
Wednesday, 07 June 2023
FSFE info booth at the Veganmania MQ-Vienna 2023
From the 2nd to the 4th of June, the first of this year’s three Vienna Veganmania Summer Festivals took place at Vienna’s Museumsquartier. Naturally, once again our local FSFE group was present with the traditional information booth.
Originally we were instructed by the organizers surprisingly to set up at another, less favorable stand location, because apparently it was not permitted to have our usual place. But after about two hours the head of the district came by and demanded that we finally set up the stand at our usual corner again. This was fine with us, because we had a wall behind us and could enjoy the shade for the whole afternoon, although this year all three days were cooler than in previous years. And this was optimal, because it was much more pleasant to chat with the many people who visited our information booth. Rain was announced for Sunday morning, but we were lucky and accordingly our materials stayed dry the whole time. Nevertheless, we would be better off using a tent in the future so that we are no longer completely dependent on perfect weather if we want to avoid rain damage to our materials.
We had ordered supplies for some of our materials ahead of time, so we also had some new brochures and stickers available. However, we exhausted them almost entirely in the three days and we clearly need to reorder more materials for the two remianing events.
It was particularly amazing that the brochure Public Money, Public Code, which is quite large, thick and specific, was almost completely snapped up on the first half day. And this was not because we were offering it to an unsuitable target audience, but rather because a number of people came by who told us that they worked in administration and would find the brochures very interesting for this reason.
Almost all info booth visitors take our locally produced overview of 10 of the most popular GNU/Linux distributions with them. Especially popular are also the Free Your Android leaflets. Our locally produced Freedom and The Truth About Your Computer leaflets are no less popular. And of course our small assortment of FSFE stickers is no wallflower either. Even our somewhat outdated games leaflet is taken again and again. It certainly helps that we have had a <0AD poster stuck to our table for years. The game is of course an eye-catcher because of its still comparatively nice graphics, even if our poster is quite worn out by now and could definitely be replaced.
Certain info posters in particular need to be re-produced because the paper no longer looks so attractive due to repeated use and the ravages of wind and rain. It might be worthwhile to have them printed on plastic so that such environmental influences cannot cause any damage. We have been wanting to add a practical install-guide to our equipment for some time. Unfortunately, we have not yet had the opportunity to put this plan into practice.
This year we met comparatively many interested people who already work professionally with Free Software themselves and we discussed some possible synergies. We have to temper our hopes, however, because experience shows that many people have spontaneous ideas but do not take them up later. Time will tell what emerges from our many constructive conversations.
But of course, the diversity of people at our information booth was much greater. As usual, we were able to reach people who had never heard of Free Software. However, many already knew at least the term Open Source and quite a few had tried a free operating system themselves at some point. Nobody reported unpleasant experiences. However, some people described the problem of the network effect, which eventually led them to switch back to proprietary solutions. Others had gotten free systems from acquaintances and simply used what came pre-installed when buying new devices. However, they had obviously not lost their interest, because they came over to us voluntarily and struck up conversation.
Only one person told us that they were happy with their proprietary equipment and did not want to change anything. Several visitors to the information stand remarked that they had visited us at our information stand several times before. Little strokes fell big oaks!
The wind blew the FSFE balloons around a lot and sometimes we had to catch them because they came loose. The bottom line in this regard is that we learned that it is more practical to display only one or two inflated and give them to people uninflated, as most kids seem to enjoy inflating them.
In conclusion, we can happily announce that this information stand was once again a complete success. It once more proved that it is worthwhile to intentionally go outside the usual Free Software environment.
Monday, 29 May 2023
Friday, 21 April 2023
Migrate docker images to another disk
There is some confusion on which is the correct way to migrate your current/local docker images to another disk. To reduce this confusion, I will share my personal notes on the subject.
Prologue
I replaced a btrfs raid-1 1TB storage with another btrfs raid-1 4TB setup. So 2 disks out, 2 new disks in. I also use luks, so all my disks are encrypted with random 4k keys before btrfs on them. There is -for sure- a write-penalty with this setup, but I am for data resilience - not speed.
Before
These are my local docker images
docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
golang 1.19 b47c7dfaaa93 5 days ago 993MB
archlinux base-devel a37dc5345d16 6 days ago 764MB
archlinux base d4e07600b346 4 weeks ago 418MB
ubuntu 22.04 58db3edaf2be 2 months ago 77.8MB
centos7 ruby 28f8bde8a757 3 months ago 532MB
ubuntu 20.04 d5447fc01ae6 4 months ago 72.8MB
ruby latest 046e6d725a3c 4 months ago 893MB
alpine latest 49176f190c7e 4 months ago 7.04MB
bash latest 018f8f38ad92 5 months ago 12.3MB
ubuntu 18.04 71eaf13299f4 5 months ago 63.1MB
centos 6 5bf9684f4720 19 months ago 194MB
centos 7 eeb6ee3f44bd 19 months ago 204MB
centos 8 5d0da3dc9764 19 months ago 231MB
ubuntu 16.04 b6f507652425 19 months ago 135MB
3bal/centos6-eol devtoolset-7 ff3fa1a19332 2 years ago 693MB
3bal/centos6-eol latest aa2256d57c69 2 years ago 194MB
centos6 ebal d073310c1ec4 2 years ago 3.62GB
3bal/arch devel 76a20143aac1 2 years ago 1.02GB
cern/slc6-base latest 63453d0a9b55 3 years ago 222MB
Yes, I am still using centos6! It’s stable!!
docker save - docker load
Reading docker’s documentation, the suggested way is docker save and docker load. Seems easy enough:
docker save --output busybox.tar busybox
docker load < busybox.tar.gz
which is a lie!
docker prune
before we do anything with the docker images, let us clean up the garbages
sudo docker system prune
docker save - the wrong way
so I used the ImageID as a reference:
docker images -a | grep -v ^REPOSITORY | awk '{print "docker save -o "$3".tar "$3}'
piped out through a bash shell | bash -x
and got my images:
$ ls -1
33a093dd9250.tar
b47c7dfaaa93.tar
16eed3dc21a6.tar
d4e07600b346.tar
58db3edaf2be.tar
28f8bde8a757.tar
382715ecff56.tar
d5447fc01ae6.tar
046e6d725a3c.tar
49176f190c7e.tar
018f8f38ad92.tar
71eaf13299f4.tar
5bf9684f4720.tar
eeb6ee3f44bd.tar
5d0da3dc9764.tar
b6f507652425.tar
ff3fa1a19332.tar
aa2256d57c69.tar
d073310c1ec4.tar
76a20143aac1.tar
63453d0a9b55.tar
docker daemon
I had my docker images on tape-archive (tar) format. Now it was time to switch to my new btrfs storage. In order to do that, the safest way is my tweaking the
/etc/docker/daemon.json
and I added the data-root section
{
"dns": ["8.8.8.8"],
"data-root": "/mnt/WD40PURZ/var_lib_docker"
}
I will explain var_lib_docker
in a bit, stay with me.
and restarted docker
sudo systemctl restart docker
docker load - the wrong way
It was time to restore aka load the docker images back to docker
ls -1 | awk '{print "docker load --input "$1".tar"}'
docker load --input 33a093dd9250.tar
docker load --input b47c7dfaaa93.tar
docker load --input 16eed3dc21a6.tar
docker load --input d4e07600b346.tar
docker load --input 58db3edaf2be.tar
docker load --input 28f8bde8a757.tar
docker load --input 382715ecff56.tar
docker load --input d5447fc01ae6.tar
docker load --input 046e6d725a3c.tar
docker load --input 49176f190c7e.tar
docker load --input 018f8f38ad92.tar
docker load --input 71eaf13299f4.tar
docker load --input 5bf9684f4720.tar
docker load --input eeb6ee3f44bd.tar
docker load --input 5d0da3dc9764.tar
docker load --input b6f507652425.tar
docker load --input ff3fa1a19332.tar
docker load --input aa2256d57c69.tar
docker load --input d073310c1ec4.tar
docker load --input 76a20143aac1.tar
docker load --input 63453d0a9b55.tar
I was really happy, till I saw the result:
# docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> b47c7dfaaa93 5 days ago 993MB
<none> <none> a37dc5345d16 6 days ago 764MB
<none> <none> 16eed3dc21a6 2 weeks ago 65.5MB
<none> <none> d4e07600b346 4 weeks ago 418MB
<none> <none> 58db3edaf2be 2 months ago 77.8MB
<none> <none> 28f8bde8a757 3 months ago 532MB
<none> <none> 382715ecff56 3 months ago 705MB
<none> <none> d5447fc01ae6 4 months ago 72.8MB
<none> <none> 046e6d725a3c 4 months ago 893MB
<none> <none> 49176f190c7e 4 months ago 7.04MB
<none> <none> 018f8f38ad92 5 months ago 12.3MB
<none> <none> 71eaf13299f4 5 months ago 63.1MB
<none> <none> 5bf9684f4720 19 months ago 194MB
<none> <none> eeb6ee3f44bd 19 months ago 204MB
<none> <none> 5d0da3dc9764 19 months ago 231MB
<none> <none> b6f507652425 19 months ago 135MB
<none> <none> ff3fa1a19332 2 years ago 693MB
<none> <none> aa2256d57c69 2 years ago 194MB
<none> <none> d073310c1ec4 2 years ago 3.62GB
<none> <none> 76a20143aac1 2 years ago 1.02GB
<none> <none> 63453d0a9b55 3 years ago 222MB
No REPOSITORY or TAG !
then after a few minutes of internet search, I’ve realized that if you use the ImageID as a reference point in docker save, you will not get these values !!!!
and there is no reference here: https://docs.docker.com/engine/reference/commandline/save/
Removed everything , removed the data-root
from /etc/docker/daemon.json
and started again from the beginning
docker save - the correct way
docker images -a | grep -v ^REPOSITORY | awk '{print "docker save -o "$3".tar "$1":"$2""}' | sh -x
output:
+ docker save -o b47c7dfaaa93.tar golang:1.19
+ docker save -o a37dc5345d16.tar archlinux:base-devel
+ docker save -o d4e07600b346.tar archlinux:base
+ docker save -o 58db3edaf2be.tar ubuntu:22.04
+ docker save -o 28f8bde8a757.tar centos7:ruby
+ docker save -o 382715ecff56.tar gitlab/gitlab-runner:ubuntu
+ docker save -o d5447fc01ae6.tar ubuntu:20.04
+ docker save -o 046e6d725a3c.tar ruby:latest
+ docker save -o 49176f190c7e.tar alpine:latest
+ docker save -o 018f8f38ad92.tar bash:latest
+ docker save -o 71eaf13299f4.tar ubuntu:18.04
+ docker save -o 5bf9684f4720.tar centos:6
+ docker save -o eeb6ee3f44bd.tar centos:7
+ docker save -o 5d0da3dc9764.tar centos:8
+ docker save -o b6f507652425.tar ubuntu:16.04
+ docker save -o ff3fa1a19332.tar 3bal/centos6-eol:devtoolset-7
+ docker save -o aa2256d57c69.tar 3bal/centos6-eol:latest
+ docker save -o d073310c1ec4.tar centos6:ebal
+ docker save -o 76a20143aac1.tar 3bal/arch:devel
+ docker save -o 63453d0a9b55.tar cern/slc6-base:latest
docker daemon with new data point
{
"dns": ["8.8.8.8"],
"data-root": "/mnt/WD40PURZ/var_lib_docker"
}
restart docker
sudo systemctl restart docker
docker load - the correct way
ls -1 | awk '{print "docker load --input "$1}'
and verify -moment of truth-
$ docker images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
archlinux base-devel 33a093dd9250 3 days ago 764MB
golang 1.19 b47c7dfaaa93 8 days ago 993MB
archlinux base d4e07600b346 4 weeks ago 418MB
ubuntu 22.04 58db3edaf2be 2 months ago 77.8MB
centos7 ruby 28f8bde8a757 3 months ago 532MB
gitlab/gitlab-runner ubuntu 382715ecff56 4 months ago 705MB
ubuntu 20.04 d5447fc01ae6 4 months ago 72.8MB
ruby latest 046e6d725a3c 4 months ago 893MB
alpine latest 49176f190c7e 4 months ago 7.04MB
bash latest 018f8f38ad92 5 months ago 12.3MB
ubuntu 18.04 71eaf13299f4 5 months ago 63.1MB
centos 6 5bf9684f4720 19 months ago 194MB
centos 7 eeb6ee3f44bd 19 months ago 204MB
centos 8 5d0da3dc9764 19 months ago 231MB
ubuntu 16.04 b6f507652425 19 months ago 135MB
3bal/centos6-eol devtoolset-7 ff3fa1a19332 2 years ago 693MB
3bal/centos6-eol latest aa2256d57c69 2 years ago 194MB
centos6 ebal d073310c1ec4 2 years ago 3.62GB
3bal/arch devel 76a20143aac1 2 years ago 1.02GB
cern/slc6-base latest 63453d0a9b55 3 years ago 222MB
success !
btrfs mount point
Now it is time to explain the var_lib_docker
but first , let’s verify ST1000DX002 mount point with WD40PURZ
$ sudo ls -l /mnt/ST1000DX002/var_lib_docker/
total 4
drwx--x--- 1 root root 20 Nov 24 2020 btrfs
drwx------ 1 root root 20 Nov 24 2020 builder
drwx--x--x 1 root root 154 Dec 18 2020 buildkit
drwx--x--x 1 root root 12 Dec 18 2020 containerd
drwx--x--- 1 root root 0 Apr 14 19:52 containers
-rw------- 1 root root 59 Feb 13 10:45 engine-id
drwx------ 1 root root 10 Nov 24 2020 image
drwxr-x--- 1 root root 10 Nov 24 2020 network
drwx------ 1 root root 20 Nov 24 2020 plugins
drwx------ 1 root root 0 Apr 18 18:19 runtimes
drwx------ 1 root root 0 Nov 24 2020 swarm
drwx------ 1 root root 0 Apr 18 18:32 tmp
drwx------ 1 root root 0 Nov 24 2020 trust
drwx-----x 1 root root 568 Apr 18 18:19 volumes
$ sudo ls -l /mnt/WD40PURZ/var_lib_docker/
total 4
drwx--x--- 1 root root 20 Apr 18 16:51 btrfs
drwxr-xr-x 1 root root 14 Apr 18 17:46 builder
drwxr-xr-x 1 root root 148 Apr 18 17:48 buildkit
drwxr-xr-x 1 root root 20 Apr 18 17:47 containerd
drwx--x--- 1 root root 0 Apr 14 19:52 containers
-rw------- 1 root root 59 Feb 13 10:45 engine-id
drwxr-xr-x 1 root root 20 Apr 18 17:48 image
drwxr-xr-x 1 root root 24 Apr 18 17:48 network
drwxr-xr-x 1 root root 34 Apr 18 17:48 plugins
drwx------ 1 root root 0 Apr 18 18:36 runtimes
drwx------ 1 root root 0 Nov 24 2020 swarm
drwx------ 1 root root 48 Apr 18 18:42 tmp
drwx------ 1 root root 0 Nov 24 2020 trust
drwx-----x 1 root root 70 Apr 18 18:36 volumes
var_lib_docker is actually a btrfs subvolume that we can mount it on our system
$ sudo btrfs subvolume show /mnt/WD40PURZ/var_lib_docker/
var_lib_docker
Name: var_lib_docker
UUID: 5552de11-f37c-4143-855f-50d02f0a9836
Parent UUID: -
Received UUID: -
Creation time: 2023-04-18 16:25:54 +0300
Subvolume ID: 4774
Generation: 219588
Gen at creation: 215579
Parent ID: 5
Top level ID: 5
Flags: -
Send transid: 0
Send time: 2023-04-18 16:25:54 +0300
Receive transid: 0
Receive time: -
Snapshot(s):
We can use the subvolume id for that:
mount -o subvolid=4774 LABEL="WD40PURZ" /var/lib/docker/
So /var/lib/docker/
path on our rootfs, is now a mount point for our BTRFS raid-1 4TB storage and we can remove the data-root declaration from /etc/docker/daemon.json
and restart our docker service.
That’s it !
Saturday, 08 April 2023
Continuing Explorations into Filesystems and Paging with L4Re
Towards the end of last year, I spent a fair amount of time trying to tidy up and document the work I had been doing on integrating a conventional filesystem into the L4 Runtime Environment (or L4Re Operating System Framework, as it now seems to be called). Some of that effort was purely administrative, such as giving the work a more meaningful name and changing references to the naming in various places, whereas other aspects were concerned with documenting mundane things like how the software might be obtained, built and used. My focus had shifted somewhat towards sharing the work and making it slightly more accessible to anyone who might be interested (even if this is probably a very small audience).
Previously, in seeking to demonstrate various mechanisms such as the way programs might be loaded and run, with their payloads paged into memory on demand, I had deferred other work that I felt was needed to make the software framework more usable. For example, I was not entirely happy with the way that my “client” library for filesystem access hid the underlying errors, making troubleshooting less convenient than it could be. Instead of perpetuating the classic Unix “errno” practice, I decided to give file data structures their own error member to retain any underlying error, meaning that a global variable would not be involved in any error reporting.
Other matters needed attending to, as well. Since acquiring a new computer in 2020 based on the x86-64 architecture, the primary testing environment for this effort has been a KVM/QEMU instance invoked by the L4Re build process. When employing the same x86-64 architecture for the instance as the host system, the instance should in theory be very efficient, but for some reason the startup time of such x86-64 instances is currently rather long. This was not the case at some point in the past, but having adopted the Git-based L4Re distribution, this performance regression made an appearance. Maybe at some stage in the future I will discover why it sits there for half a minute spinning up at the “Booting ROM” stage, but for now a reasonable workaround is to favour QEMU instances for other architectures when testing my development efforts.
Preserving Portability
Having long been aware of the necessity of software portability, I have therefore been testing the software in QEMU instances emulating the classic 32-bit x86 architecture as well as MIPS32, in which I have had a personal interest for several years. Surprisingly, testing on x86 revealed a few failures that were not easily explained, but I eventually tracked them down to interoperability problems with the L4Re IPC library, where that library was leaving parts of IPC message values uninitialised and causing my own IPC library to misinterpret the values being sent. This investigation also led me to discover that the x86 Application Binary Interface is rather different in character to the ABI for other architectures. On those other architectures, the alignment of members in structures (and of parameters in parameter lists) needs to be done more carefully due to the way values in memory are accessed. On x86, meanwhile, it seems that values of different sizes can be more readily packed together.
In any case, I came to believe that the L4Re IPC library is not following the x86 ABI specification in the way IPC messages are prepared. I did wonder whether this was deliberate, but I think that it is actually inadvertent. One of my helpful correspondents confirmed that there was indeed a discrepancy between the L4Re code and the ABI, but nothing came of any enquiries into the matter, so I imagine that in any L4Re systems deployed on x86 (although I doubt that there can be many), the use of the L4Re code on both sides of any given IPC transaction manages to conceal this apparent deficiency. The consequence for me was that I had to introduce a workaround in the cases where my code needs to interact with various existing L4Re components.
Several other portability changes were made to resolve a degree of ambiguity around the sizes of various types. This is where the C language family and various related standards and technologies can be infuriating, with care required when choosing data types and then using these in conjunction with libraries that might have their own ideas about which types should be used. Although there probably are good reasons for some types to be equivalent to a “machine word” in size, such types sit uncomfortably with types of other, machine-independent sizes. I am sure I will have to revisit these choices over and over again in future.
Enhancing Component Interface Descriptions
One thing I had been meaning to return to was the matter of my interface description language (IDL) tool and its lack of support for composing interfaces. For example, a component providing file content might expose several different interfaces for file operations, dataspace operations, and so on. These compound interfaces had been defined by specifying arguments for each invocation of the IDL tool that indicate all the interfaces involved, and thus the knowledge of each compound interface ended up being encoded as definitions within Makefiles like this:
mapped_file_object_INTERFACES = dataspace file flush mapped_file notification
A more natural approach involved defining these interfaces in the interface description language itself, but this was going to require putting in the effort to extend the tool, which would not be particularly pleasant, being written in C using Flex and Bison.
Eventually, I decided to just get on with remedying the situation, adding the necessary tool support, and thus tidying up and simplifying the Makefiles in my L4Re build system package. This did raise the complexity level in the special Makefiles provided to support the IDL tool – nothing in the realm of Makefiles is ever truly easy – but it hopefully confines such complexity out of sight and keeps the main project Makefiles as concise as can reasonably be expected. For reference, here is how a file component interface looks with this new tool support added:
interface MappedFileObject composes Dataspace, File, Flush, MappedFile, Notification;
And for reference, here is what one of the constituent interfaces looks like:
interface Flush { /* Flush data and update the size, if appropriate. */ [opcode(5)] void flush(in offset_t populated_size, out offset_t size); };
I decided to diverge from previous languages of this kind and to use “composes” instead of language like “inherits”. These compound interface descriptions deliberately do not seek to combine interfaces in a way that entirely resembles inheritance as supported by various commonly used programming languages, and an interface composing other interfaces cannot also add operations of its own: it can merely combine other interfaces. The main reason for such limitations is the deliberate simplicity or lack of capability of the tool: it only really transcribes the input descriptions to equivalent forms in C or C++ and neglects to impose many restrictions of its own. One day, maybe I will revisit this and at least formalise these limitations instead of allowing them to emerge from the current state of the implementation.
A New Year
I had hoped to deliver something for broader perusal late last year, but the end of the year arrived and with it some intriguing but increasingly time-consuming distractions. Having written up the effective conclusion of those efforts, I was able to turn my attention to this work again. To start with, that involved reminding myself where I had got to with it, which underscores the need for some level of documentation, because documentation not only communicates the nature of a work to others but it also communicates it to one’s future self. So, I had to spend some time rediscovering the finer detail and reminding myself what the next steps were meant to be.
My previous efforts had demonstrated the ability to launch new programs from my own programs, reproducing some of what L4Re already provides but in a form more amenable to integrating with my own framework. If the existing L4Re code had been more obviously adaptable in a number of different ways throughout my long process of investigation and development for it, I might have been able to take some significant shortcuts and save myself a lot of effort. I suppose, however, that I am somewhat wiser about the technologies and techniques involved, which might be beneficial in its own way. The next step, then, was to figure out how to detect and handle the termination of programs that I had managed to launch.
In the existing L4Re framework, a component called Ned is capable of launching programs, although not being able to see quite how I might use it for my own purposes – that being to provide a capable enough shell environment for testing – had led me along my current path of development. It so happens that Ned supports an interface for “parent” tasks that is used by created or “child” tasks, and when a program terminates, the general support code for the program that is brought along by the C library includes the invocation of an operation on this parent interface before the program goes into a “wait forever” state. Handling this operation and providing this interface seemed to be the most suitable approach for replicating this functionality in my own code.
Consolidation and Modularisation
Before going any further, I wanted to consolidate my existing work which had demonstrated program launching in a program written specifically for that purpose, bringing along some accompanying abstractions that were more general in nature. First of all, I decided to try and make a library from the logic of the demonstration program I had written, so that the work involved in setting up the environment and resources for a new program could be packaged up and re-used. I also wanted the functionality to be available through a separate server component, so that programs wanting to start other programs would not need to incorporate this functionality but could instead make a request to this separate “process server” component to do the work, obtaining a reference to the new program in response.
One might wonder why one might bother introducing a separate component to start programs on another program’s behalf. As always when considering the division of functionality between components in a microkernel-based system, it is important to remember that components can have different configurations that afford them different levels of privilege within a system. We might want to start programs with one level of privilege from other programs with a different level of privilege. Another benefit of localising program launching in one particular component is that it might provide an overview of such activities across a number of programs, thus facilitating support for things like job and process control.
Naturally, an operating system does not need to consolidate all knowledge about running programs or processes in one place, and in a modular microkernel-based system, there need not even be a single process server. In fact, it seems likely that if we preserve the notion of a user of the system, each user might have their own process server, and maybe even more than one of them. Such a server would be configured to launch new programs in a particular way, having access only to resources available to a particular user. One interesting possibility is that of being able to run programs provided by one filesystem that then operate on data provided by another filesystem. A program would not be able to see the filesystem from which it came, but it would be able to see the contents of a separate, designated filesystem.
Region Mapper Deficiencies
A few things conspired to make the path of progress rather less direct than it might have been. Having demonstrated the launching of trivial programs, I had decided to take a welcome break from the effort. Returning to the effort, I decided to test access to files served up by my filesystem infrastructure, and this caused programs to fail. In order to support notification events when accessing files, I employ a notification thread to receive such events from other components, but the initialisation of threading in the C library was failing. This turned out to be due to the use of a region mapper operation that I had not yet supported, so I had to undertake a detour to implement an appropriate data structure in the region mapper, which in C++ is not a particularly pleasant experience.
Later on, the region mapper caused me some other problems. I had neglected to implement the detach operation, which I rely on quite heavily for my file access library. Attempting to remedy these problems involved reacquainting myself with the region mapper interface description which is buried in one of the L4Re packages, not to be confused with plenty of other region mapper abstractions which do not describe the actual interface employed by the IPC mechanism. The way that L4Re has abandoned properly documented interface descriptions is very annoying, requiring developers to sift through pages of barely commented code and to be fully aware of the role of that code. I implemented something that seemed to work, quite sure that I still did not have all the details correct in my implementation, and this suspicion would prove correct later on.
Local and Non-Local Capabilities
Another thing that I had not fully understood, when trying to put together a library handling IPC that I could tolerate working with, was the way that capabilities may be transferred in IPC messages within tasks. Capabilities are references to components in the system, and when transferred between tasks, the receiving task is meant to allocate a “slot” for each received capability. By choosing a slot denoted by an index, the task (or the program running in it) can tell the kernel where to record the capability in its own registry for the task, and by employing this index in its own registry, the program will be able to maintain a record of available capabilities consistent with that of the kernel.
The practice of allocating capability slots for received capabilities is necessary for transfers between tasks, but when the transfer occurs within a task, there is no need to allocate a new slot: the received capability is already recorded within the task, and so the item describing the capability in the message will actually encode the capability index known to the task. Previously, I was not generally sending capabilities in messages within tasks, and so I had not knowingly encountered any issues with my simplistic “general case” support for capability transfers, but having implemented a region mapper that resides in the same task as a program being run, it became necessary to handle the capabilities presented to the region mapper from within the same task.
One counterintuitive consequence of the capability management scheme arises from the general, inter-task transfer case. When a task receives a capability from another task, it will assign a new index to the capability ahead of time, since the kernel needs to process this transfer as it propagates the message. This leaves the task with a new capability without any apparent notion of whether it has seen that capability before. Maybe there is a way of asking the kernel if two capabilities refer to the same object, but it might be worthwhile just not relying on such facilities and designing frameworks around such restrictions instead.
Starting and Stopping
So, back to the exercise of stopping programs that I had been able to start! It turned out that receiving the notification that a program had finished was only the start; what then needed to happen was something of a mystery. Intuitively, I knew that the task hosting the program’s threads would need to be discarded, but I envisaged that the threads themselves probably needed to be discarded first, since they are assigned to the task and probably cannot have that task removed from under them, even if they are suspended in some sense.
But what about everything else referenced by the task? After all, the task will have capabilities for things like dataspaces that provide access to regions of files and to the program stack, for things like the filesystem for opening other files, for semaphore and IRQ objects, and so on. I cannot honestly say that I have the definitive solution, and I could not easily find much in the way of existing guidance, so I decided in the end to just try and tidy all the resources up as best I could, hopefully doing enough to make it possible to release the task and have the kernel dispose of it. This entailed a fairly long endeavour that also encouraged me to evolve the way that the monitoring of the process termination is performed.
When the started program eventually reaches the end and sends a message to its “parent” component, that component needs to record any termination state communicated in the message so that it may be passed on to the program’s creator or initiator, and then it also needs to commence the work of wrapping up the program. Here, I decided on a distinct component separate from one responsible for any paging activities to act as the contact point for the creating or initiating program. When receiving a termination message or signal, this component disconnects the terminating program from its internal pager by freeing the capability, and this then causes the internal pager to terminate, itself sending a signal to its own parent.
One important aspect of starting and terminating processes is that of notifying the party that sought to start a process in the first place. For filesystem operations, I had already implemented support for certain notification events related to opening, modifying and closing files and pipes, with these being particularly important for pipes. I wanted to extend this support to processes so that it might be possible to monitor files, pipes and processes together using a kind of select or poll operation. This led to a substantial detour where I became dissatisfied with the existing support, modified it, had to debug it, and remain somewhat concerned that it might need more work in the future.
Testing on the different architectures under QEMU also revealed that I would need to handle the possibility that a program might be started and run to completion before its initiator had even received a reference to the program for notification purposes. Fortunately, a similar kind of vanishing resource problem arose when I was developing the file paging system, and so I had a technique available to communicate the reference to the process monitor component to the initiator of the program, ensuring that the process monitor becomes established in the kernel’s own records, before the program itself gets started, runs and completes, avoiding the process monitor being tidied up before its existence becomes known to the wider system.
Wrapping Up Again
A few concerns remain with the state of the work so far. I experienced problems with filesystem access that I traced to the activity of repeatedly attaching and detaching dataspaces, which is something my filesystem access library does deliberately, but the error suggested that the L4Re region mapper had somehow failed to attach the appropriate region. This may well be caused by issues within my own code, and my initial investigation did indeed uncover a problem in my own code where the size of the attached region of a file would gradually increase over time. With this mistake fixed, the situation was improved, but the underlying problem was not completely eliminated, judging from occasional errors. A workaround has been adopted for now.
Various other problems arose and were hopefully resolved. I would say that some of them were due to oversights when getting things done takes precedence over a more complete consideration of all the issues, particularly when working in a language like C++ where lower-level chores like manual memory management enter the picture. The differing performance when emulating various architectures under QEMU also revealed a deficiency with my region mapper implementation. It turned out that detach operations were not returning successfully, leading the L4Re library function to return without invalidating memory pages, and so my file access operations were returning pages of incorrect content instead of the expected file content for the first few accesses until the correct pages had been paged in and were almost continuously resident.
Here, yet more digging around in the L4Re code revealed an apparent misunderstanding about the return value associated with one of the parameters to the detach operation, that of the detached dataspace. I had concluded that a genuine capability was meant to be returned, but it seems that a simple index value is returned in a message word instead of a message item, and so there is no actual capability transferred to the caller, not even a local one. The L4Re IPC framework does not really make the typing semantics very clear, or at least not to me, and the code involved is quite unfathomable. Again, a formal interface specification written in a clearly expressed language would have helped substantially.
Next Steps
I suppose progress of sorts has been made in the last month or so, for which I can be thankful. Although tidying up the detritus of my efforts will remain an ongoing task, I can now initiate programs and wait for them to finish, meaning that I can start building up test suites within the environment, combining programs with differing functionality in a Unix-like fashion to hopefully validate the behaviour of the underlying frameworks and mechanisms.
Now, I might have tried much of this with L4Re’s Lua-based scripting, but it is not as straightforward as a more familiar shell environment, appearing rather more low-level in some ways, and it is employed in a way that seems to favour parallel execution instead of the sequential execution that I might desire when composing tests: I want tests to involve programs whose results feed into subsequent programs, as opposed to just running a load of programs at once. Also, without more extensive documentation, the Lua-based scripting support remains a less attractive choice than just building something where I get to control the semantics. Besides, I also need to introduce things like interprocess pipes, standard input and output, and such things familiar from traditional software platforms. Doing that for a simple shell-like environment would be generally beneficial, anyway.
Should I continue to make progress, I would like to explore some of the possibilities hinted at above. The modular architecture of a microkernel-based system should allow a more flexible approach in partitioning the activities of different users, along with the configuration of their programs. These days, so much effort is spent in “orchestration” and the management of containers, with a veritable telephone directory of different technologies and solutions competing for the time and attention of developers who are now also obliged to do the work of deployment specialists and systems administrators. Arguably, much of that involves working around the constraints of traditional systems instead of adapting to those systems, with those systems themselves slowly adapting in not entirely convincing or satisfactory ways.
I also think back to my bachelor’s degree dissertation about mobile software agents where the idea was that untrusted code might be transmitted between systems to carry out work in a safe and harmless fashion. Reducing the execution environment of such agent programs to a minimum and providing decent support for monitoring and interacting with them would be something that might be more approachable using the techniques explored in this endeavour. Pervasive, high-speed, inexpensively-accessed networks undermined the envisaged use-cases for mobile agents in general, although the practice of issuing SQL queries to database servers or having your browser run JavaScript programs deployed in Web pages demonstrates that the general paradigm is far from obsolete.
In any case, my “to do” list for this project will undoubtedly remain worryingly long for the foreseeable future, but I will hopefully be able to remedy shortcomings, expand the scope and ambition of the effort, and continue to communicate my progress. Thank you to those who have made it to the end of this rather dry article!
The KDE Qt5 Patch Collection has been rebased on top of Qt 5.15.9
Commit: https://invent.kde.org/qt/qt/qt5/-/commit/4c0d35b0991216766ca301de205599d1daa72057
Commercial release announcement: https://www.qt.io/blog/commercial-lts-qt-5.15.9-released
OpenSource release announcement: https://lists.qt-project.org/pipermail/announce/2023-April/000406.html
As usual I want to personally extend my gratitude to the Commercial users of Qt for beta testing Qt 5.15.9 for the rest of us.
The Commercial Qt 5.15.9 release introduced one bug that have later been fixed. Thanks to that, our Patchset Collection has been able to incorporate the fix for the issue [1] and the Free Software users will never be affected by it!
P.S: Special shout-out to Andreas Sturmlechner for identifying the fix of the issue, since I usually only pay attention to "Revert XYZ" commits and this one is not a revert but subsequent improvement
Saturday, 01 April 2023
RAII: Tragedy in three acts
In a recent Computerphile video, Ian Knight talked about RAII idiom and it’s application in C++ and Rust. While the video described the general concepts, I felt different examples could be more clearly convey essence of the topic.
I’ve decided to give my own explanation to hopefully better illustrate what RAII is and how it relates to Rust’s ownership. Then again, for whatever reason I’ve decided to write it as a play with dialogue in faux Old English so it may well be even more confusing instead.
Cast of characters | |
(In the order of appearance) | |
Gregory | A software engineer and Putuel’s employee number #1 |
Sampson | A software engineer and a self-proclaimed 10× developer |
Paris | An apprentice returning to Putuel two summers in a row |
CTO | Puteal’s Chief Technical Officer spending most of his time in meetings |
Admin | Administrative assistant working in Puteal Corporation’s headquarters in Novear |
Act I | |
Scene I | Novear. A public place. |
Enter Sampson and Gregory, two senior engineers of the Puteal Corporation, carrying laptops and phones | |
Gregory | Pray tell, what doth the function’s purpose? |
Sampson | It doth readeth a number from a file. A task as trivial as can be and yet QA reports memory leak after my change. Hence, I come to thee for help. |
Both look at a laptop showing code Sampson has written [error handling omitted for brevity from all source code listings]: | |
double read_double(FILE *fd) { char *buffer = malloc(1024); /* allocate temporary buffer */ fgets(buffer, 1024, fd); /* read first line of the file */ return atof(buffer); /* parse and return the number */ } | |
Gregory | Thine mistake is apparent. Thou didst allocate memory but ne’er freed it. Verily, in C thou needs’t to explicitly free any memory thou dost allocate. Submit this fix and thy code shall surely pass. |
double read_double(FILE *fd) { char *buffer = malloc(1024); /* allocate temporary buffer */ fgets(buffer, 1024, fd); /* read first line of the file */ double result = atoi(buffer); /* parse the line */ free(buffer); /* free the temporary buffer */ return result; /* return parsed number */ } | |
Scene II | A hall. |
Enter Puteal CTO, an apprentice called Paris and an Admin | |
Paris | I’ve done as Sampson beseeched of me. I’ve taken the read_double function and changed it so that it doth taketh file path as an argument. He hath warned me about managing memory and so I’ve made sure all temporary buffers are freed. Nonetheless, tests fail. |
double read_double(const char *path) { FILE *fd = fopen(path, "r"); /* open file */ char *buffer = malloc(1024); fgets(buffer, 1024, fd); double result = atof(buffer); free(buffer); return result; } | |
CTO | Thou didst well managing memory, but memory isn’t the only resource that needs to be freed. Just like allocations, if thou dost open a file, thou must close it anon once thou art done with it. |
Exit CTO and Admin towards sounds of a starting meeting | |
Paris | Managing resources is no easy task but I think I’m starting to get the hang of it. |
double read_double(const char *path) { FILE *fd = fopen(path, "r"); /* open file */ char *buffer = malloc(1024); fgets(buffer, 1024, fd); fclose(fd); /* close the file */ double result = atof(buffer); free(buffer); return result; } | |
Scene III | Novear. A room in Puteal’s office. |
Enter Paris and Sampson they set them down on two low stools, and debug | |
Paris | The end of my apprenticeship is upon me and yet my code barely doth work. It canst update the sum once but as soon as I try doing it for the second time, nothing happens. |
double update_sum_from_file(mtx_t *lock, double *sum, const char *path) { double value = read_double(path); /* read next number from file */ mtx_lock(lock); /* reserve access to `sum` */ value += sum->value; /* calculate sum */ sum->value = value; /* update the sum */ return value; /* return new sum */ } | |
Sampson | Thou hast learned well that resources need to be acquired and released. But what thou art missing is that not only system memory or a file descriptor are resources. |
Paris | So just like memory needs to be freed, files need to be closed and locks needs to be unlocked! |
double update_sum_from_file(mtx_t *lock, double *sum, const char *path) { double value = read_double(path); /* read next number from file */ mtx_lock(lock); /* reserve access to `sum` */ value += *sum; /* calculate sum */ *sum = value; /* update the sum */ mtx_unlock(lock); /* release `sum` */ return value; /* return new sum */ } | |
Paris | I’m gladdened I partook the apprenticeship. Verily, I’ve learned that resources need to be freed once they art no longer used. But also that many things can be modelled like a resource. |
I don’t comprehend why it all needs to be done manually? | |
Exit Sampson while Paris monologues leaving him puzzled | |
Act II | |
Scene I | Court of Puteal headquarters. |
Enter Sampson and Paris bearing a laptop before him | |
Paris | Mine last year’s apprenticeship project looks naught like mine own handiwork. |
Sampson | Thou seest, in the year we migrated our code base to C++. |
Paris | Aye, I understandeth. But I spent so much time learning about managing resources and yet the new code doth not close its file. |
Enter Gregory and an Admin with a laptop. They all look at code on Paris’ computer: | |
double read_double(const char *path) { std::fstream file{path}; /* open file */ double result; /* declare variable to hold result */ file >> result; /* read the number */ return result; /* return the result */ } | |
Sampson | Oh, that’s RAII. Resource Acquisition Is Initialisation idiom. C++ usetht it commonly. |
Gregory | Resource is acquired when object is initialised and released when it’s destroyed. The compiler tracks lifetimes of local variables and thusly handles resources for us. |
By this method, all manner of resources can be managed. And forsooth, for more abstract concepts without a concrete object representing them, such as the concept of exclusive access to a variable, a guard class can be fashioned. Gaze upon this other function: | |
double update_sum_from_file(std::mutex &lock, double *sum, const char *path) { double value = read_double(path); /* read next number from file */ std::lock_guard<std::mutex> lock{mutex}; /* reserve access to `sum` */ value += *sum; /* calculate sum */ *sum = value; /* update the sum */ return value; /* return new sum */ } | |
Paris | I perceive it well. When the lock goes out of scope, the compiler shall run its destructor, which shall release the mutex. Such was my inquiry yesteryear. Thus, compilers can render managing resources more automatic. |
Scene II | Novear. Sampson’s office. |
Enter Gregory and Sampson | |
Sampson | Verily, this bug doth drive me mad! To make use of the RAII idiom, I’ve writ an nptr template to automatically manage memory. |
template<class T> struct nptr { nptr(T *ptr) : ptr(ptr) {} /* take ownership of the memory */ ~nptr() { delete ptr; } /* free memory when destructed */ T *operator->() { return ptr; } T &operator*() { return *ptr; } private: T *ptr; }; | |
Gregory | I perceive… And what of the code that bears the bug? |
Sampson | 'Tis naught but a simple code which calculates sum of numbers in a file: |
std::optional<double> try_read_double(nptr<std::istream> file) { double result; return *file >> result ? std::optional{result} : std::nullopt; } double sum_doubles(const char *path) { nptr<std::istream> file{new std::fstream{path}}; std::optional<double> number; double result = 0.0; while ((number = try_read_double(file))) { result += *number; } return result; } | |
Enter Paris with an inquiry for Sampson; seeing them talk he pauses and listens in | |
Gregory | The bug lies in improper ownership tracking. When ye call the try_read_double function, a copy of thy nptr is made pointing to the file stream. When that function doth finish, it frees that very stream, for it believes that it doth own it. Alas, then you try to use it again in next loop iteration. |
Why hast thou not made use of std::unique_ptr ? | |
Sampson | Ah! I prefer my own class, good sir. |
Gregory | Thine predicament would have been easier to discern if thou hadst used standard classes. In truth, if thou wert to switch to the usage of std::unique_ptr , the compiler would verily find the issue and correctly refuse to compile the code. |
std::optional<double> try_read_double(std::unique_ptr<std::istream> file) { double result; return *file >> result ? std::optional{result} : std::nullopt; } double sum_doubles(const char *path) { auto file = std::make_unique<std::fstream>(path); std::optional<double> number; double result = 0.0; while ((number = try_read_double(file))) { /* compile error */ result += *number; } return result; } | |
Exit Gregory, exit Paris moment later | |
Scene III | Before Sampson’s office. |
Enter Gregory and Paris, meeting | |
Paris | I’m yet again vexed. I had imagined that with RAII, the compiler would handle all resource management for us? |
Gregory | Verily, for RAII to function, each resource must be owned by a solitary object. If the ownership may be duplicated then problems shall arise. Ownership may only be moved. |
Paris | Couldn’t compiler enforce that just like it can automatically manage resources? |
Gregory | Mayhap the compiler can enforce it, but it’s not a trivial matter. Alas, if thou art willing to spend time to model ownership in a way that the compiler understands, it can prevent some of the issues. However, thou wilt still require an escape hatch, for in the general case, the compiler cannot prove the correctness of the code. |
Exit Gregory and Paris, still talking | |
Act III | |
Scene I | A field near Novear. |
Enter Gregory and Paris | |
Gregory | Greetings, good fellow! How hast thou been since thy apprenticeship? |
Paris | I’ve done as thou hast instructed and looked into Rust. It is as thou hast said. I’ve recreated Sampson’s code and the compiler wouldn’t let me run it: |
fn try_read_double(rd: Box<dyn std::io::Read>) -> Option<f64> { todo!() } fn sum_doubles(path: &std::path::Path) -> f64 { let file = std::fs::File::open(path).unwrap(); let file: Box<dyn std::io::Read> = Box::new(file); let mut result = 0.0; while let Some(number) = try_read_double(file) { result += number; } result } | |
Gregory | Verily, the compiler hath the vision to behold the migration of file’s ownership into the realm of try_read_double function during the first iteration and lo, it is not obtainable any longer by sum_doubles . |
error[E0382]: use of moved value: `file` let file: Box<dyn std::io::Read> = Box::new(file); ---- move occurs because `file` has type `Box<dyn std::io::Read>`, which does not implement the `Copy` trait let mut result = 0.0; while let Some(number) = try_read_double(file) { ^^^^ value moved here, in previous iteration of loop | |
Paris | Alas, I see not what thou hast forewarned me of. The syntax present doth not exceed that which wouldst be used had this been writ in C++: |
fn try_read_double(rd: &dyn std::io::Read) -> Option<f64> { todo!() } fn sum_doubles(path: &std::path::Path) -> f64 { let file = std::fs::File::open(path).unwrap(); let file: Box<dyn std::io::Read> = Box::new(file); let mut result = 0.0; while let Some(number) = try_read_double(&*file) { result += number; } result } | |
Gregory | Verily, the Rust compiler is of great wit and often elides lifetimes. Nonetheless, other cases may prove more intricate. |
struct Folder<T, F>(T, F); impl<T, F: for <'a, 'b> Fn(&'a mut T, &'b T)> Folder<T, F> { fn push(&mut self, element: &T) { (self.1)(&mut self.0, element) } } | |
Paris | Surely though, albeit this code is more wordy, it is advantageous if I cannot commit an error in ownership. |
Gregory | Verily, there be manifold factors in the selection of a programming tongue. And there may be aspects which may render other choices not imprudent. |
Aforeword
A thing to keep in mind is that the examples are somewhat contrived. For example, the buffer and file object present in read_double
function can easily live on the stack. A real-life code wouldn’t bother allocating them on the heap. Then again, I could see a beginner make a mistake of trying to bypass std::unique_ptr
not having a copy constructor by creating objects on heap and passing pointers around.
In the end, is this better explanation than the one in aforementioned Computerphile video? I’d argue code examples represent discussed concepts better though to be honest form of presentation hinders clarity of explanation. Yet, I had too much fun messing around with this post so here it is in this form.
Lastly, I don’t know Old English so the dialogue is probably incorrect. I’m happy to accept corrections but otherwise I don’t care that much. One shouldn’t take this post too seriously.
Thursday, 30 March 2023
Configuring algorithms in Modern C++
When designing library code, one often wonders: “Are these all the parameters this function will ever need?” and “How can a user conveniently change one parameter without specifying the rest?” This post introduces some Modern C++ techniques you can use to make passing configuration options easy for your users while allowing you to add more options later on.
Prerequisites
Most people who have programmed in C++ before should have no problems understanding this article, although you will likely appreciate it more, if you are a library developer or have worried about the forward-compatibility of your code.
Some of the features introduced in this post do not yet work with Clang. Most should work with MSVC, but I only double-checked the code with GCC12 (any version >= 10 should work).
Motivation
Let’s say you are writing an algorithm with the following signature:
auto algo(auto data, size_t threads = 4ull);
It takes some kind of data input, does lots of magic computation on it and returns some other data. An actual
algorithm should of course clearly state what kind of input data it expects (specific type or constrained template
parameter), but we want to focus on the other parameters in this post.
The only configuration option that you want to expose is the number of threads it shall use. It defaults to 4
,
because you know that the algorithm scales well at four threads and also you assume that most of your users
have at least four cores on their system.
Now, a bit later, you have added an optimisation to the algorithm called heuristic42
. It improves the results in
almost all cases, but there are a few corner cases where users might want to switch it off. The interface now looks
like this:
auto algo(auto data, size_t threads = 4ull, bool heuristic42 = true);
This is not too bad, you might think, but there are already two ugly things about this:
- To overwrite the second “config option”, the user needs to also specify the first, i.e.
algo("data", 4, false);
. This means that they need to look up (and enter correctly) the first config option’s default value. Also, if you change that default in a future release of the code, this change will not be reflected in the user’s invocation who unknowingly enforces the old default. - Since passing arguments to the function does not involve the parameter’s name, it is very easy to confuse the order
of the parameters. Implicit conversions make this problem even worse, so invoking the above interface with
algo("data", false, 4);
instead ofalgo("data", 4, false);
generates no warning, even with-Wall -Wextra -pedantic
!
Wow, what a usability nightmare, and we only have two config options! Imagine adding a few more…
Dedicated config object
As previously mentioned, the parameter name cannot be used when passing arguments. However, C++20 did add designated initialisers for certain class types. So you can use the name of a member variable when initialising an object. We can use that!
struct algo_config
{
bool heuristic42 = true;
size_t threads = 4ull;
};
auto algo(auto data, algo_config const & cfg)
{
/* implementation */
}
int main()
{
/* create the config object beforehand (e.g. after argument parsing) */
algo_config cfg{.heuristic42 = false, .threads = 8};
algo("data", cfg);
/* create the config object ad-hoc */
algo("data", algo_config{.heuristic42 = false, .threads = 8}); // set both paramaters
algo("data", algo_config{.threads = 8}); // set only one parameter
/* providing the config type's name is optional */
algo("data", {.threads = 8});
}
As you can see, this solves both of the problems mentioned previously! We refer to the config elements by name to avoid mixups, and we can choose to overwrite only those parameters that we actually want to change; other configuration elements will be whatever they are set to by default. Conveniently, this allows changing the default later on, and all invocations that don’t overwrite it will pick up the new default.
Another great feature is that the API maintainer of the algorithm can easily add more members to the configuration object without invalidating any existing invocations. This allows users of the API to gradually adopt new opt-in features.
As the name of the config type can even be omitted (see last invocation), the syntactic overhead for the “ad-hoc” initialisation is very low, almost like providing the arguments directly to the function-call.
There is an important catch: The order in which the designated initialisers are given has to correspond to the order in which the respective members are defined in the type of the config. It is okay to omit initialisers at the beginning, middle or end (as long as defaults are given), but the relative order of all the initialisers that you do provide has to be correct. This might sound like a nuisance, but in contrast to the problem discussed initially (mixed up order of function arguments), you will actually get a compiler-error that tells you that you got the order wrong; so the problem is easily detected and fixed. And there is a nice rule that you can follow for such config objects: always sort the members alphabetically! That way users intuitively know the order and don’t have to look it up 💡
Types as config elements
Now, sometimes you want to pass a type as kind of parameter to an algorithm. Imagine that the algorithm internally juggles a lot of integers. Maybe it even does SIMD with them. In those cases, the size of the integers could affect performance noticeably.
Some algorithms might be able to infer from the data input’s type which integers to use for computation, but in other cases you want the user to be able to override this. Thus we need the ability to pass the desired type to the algorithm. The canonical way of doing this is via template arguments:
template <typename int_t>
auto algo(auto data, size_t threads = 4ull);
But this is has the same problems that we discussed initially: as soon as multiple types are passed, it is possible confuse the order (and not be notified); to set a later parameters, you need to also set previous ones; et cetera. There might also be weird interactions with the type of the data parameter, in case that is a template parameter.
Let’s add the “type parameter” to our config object instead:
/* We define a "type tag" so we can pass types as values */
template <typename T>
inline constinit std::type_identity<T> ttag{};
/* The config now becomes a template */
template <typename Tint_type = decltype(ttag<uint64_t>)>
struct algo_config
{
bool heuristic42 = true;
Tint_type int_type = ttag<uint64_t>;
size_t threads = 4ull;
};
/* And also the algorithm */
template <typename ...Ts>
auto algo(auto data, algo_config<Ts...> const & cfg)
{
/* implementation */
}
int main()
{
/* Setting just "value parameters" still works with and without "algo_config" */
algo("data", algo_config{.heuristic42 = false, .threads = 8});
algo("data", {.heuristic42 = false, .threads = 8});
/* When setting a "type parameter", we need to add "algo_config" */
algo("data", algo_config{.int_type = ttag<uint32_t>, .threads = 8});
}
There are a few things happening here. In the beginning, we use variable templates to define an object that “stores” a type. This can later be used to initialise members of our config object.
Next, we need to make algo_config
a template. Unfortunately, we need to default the template parameter
as well as giving the member a default value. Finally, algo()
needs template parameters for the config, as well.
It is handy to just use a parameter pack here, because it means we don’t need to change it if we add more
template parameters the config type.
This is all a bit more verbose than before, after all we are still writing C++ 😅 But most of this will be hidden
from the user anyway.
The invocation of the algorithm is almost unchanged from before, we just use ttag<uint32_t>
to initialise
the “type parameter” of the config.
There is one caveat: when passing such “type parameters”, it is now necessary to add algo_config
, although,
fortunately, you do not need to spell out the template arguments.
In general, this may be a bit surprising, so I recommend always including the config-name in examples to teach
your users a single syntax.
Constants as config elements
Using a similar technique to the one above, we can also pass compile-time constants to the config object.
This allows the algorithm to conveniently use if constexpr
to choose between different codepaths, e.g.
between a SIMD-based codepath and a regular one.
/* We define a "value tag" type so we can pass values as types...*/
template <auto v>
struct vtag_t
{
static constexpr auto value = v;
};
/* ...and then we define a variable template to pass the type as value again! */
template <auto v>
inline constinit vtag_t<v> vtag{};
/* The config is a template */
template <typename Tuse_simd = vtag_t<false>>
struct algo_config
{
bool heuristic42 = true;
size_t threads = 4ull;
Tuse_simd use_simd = vtag<false>;
};
/* The algorithm */
template <typename ...Ts>
auto algo(auto data, algo_config<Ts...> const & cfg)
{
/* implementation */
}
int main()
{
/* Setting just "value parameters" still works with and without "algo_config" */
algo("data", algo_config{.heuristic42 = false, .threads = 8});
algo("data", {.heuristic42 = false, .threads = 8});
/* When setting a "constant parameter", we need to add "algo_config" */
algo("data", algo_config{.threads = 8, .use_simd = vtag<true>});
}
As you can see, this is very similar to the previous example. The only difference is, that we need another initial step to encode the value as a type. It is even possible to have parameters that are (run-time) values by default, but can be configured as (compile-time) constants in the way shown above. And, of course, all kinds of config options can be combined.
Note that the definitions of the “tagging” features would happen in your utility code. Users only need to know
that they can pass constants via vtag<42>
and types via ttag<int32_t>
.
Post scriptum
I hope this post was helpful to some of you. I think this is a big step forward for usability, and I hope Clang catches up with the required features as soon as possible!
There are two things here that could be improved:
- If a template parameter can be deduced from member initialisers, it should be. This would allow us to omit
the default template arguments for
algo_config
, i.e.= decltype(ttag<uint64_t>)
and= vtag_t<false>
. - When a brace-enclosed initialiser list is passed to a function template to initialise a parameter of deduced type,
consider the contents of that initialiser list. This would allow us to omit
align_config
also when passing “type parameters” or constants.
I have the feeling that 1. might not be too difficult and also not too controversial. But I suspect that 2. would be more complicated as it interacts with function overloading and I can imagine situations were this change would break existing code.
But I’d love to here other people’s opinion on the matter!
References
The ISO WG21 papers that added these features to C++:
- P0329 Designated Initialisers, C++20
- P1816 and P2082 Class template argument deduction for aggregates, C++20, not yet in Clang
Monday, 20 March 2023
Flow-Based Programming, a way for AI and humans to develop together
I think by now everybody reading this will have seen how the new generation of Large Language Models like ChatGPT are able to produce somewhat useful code. Like any advance in software development—from IDEs to high-level languages—this has generated some discussion on the future employment prospects in our field.
This made me think about how these new tools could fit the world of Flow-Based Programming, a software development technique I’ve been involved with for quite a while. In Flow-Based Programming these is a very strict boundary between reusable “library code” (called Components) and the “application logic” (called the Graph).
Here’s what the late J. Paul Morrison wrote on the subject in his seminal work, Flow-Based Programming: A New Approach to Application Development (2010):
Just as in the preparation and consumption of food there are the two roles of cook and diner, in FBP application development there are two distinct roles: the component builder and the component user or application designer.
…The application designer builds applications using already existing components, or where satisfactory ones do not exist s/he will specify a new component, and then see about getting it built.
Remembering that passage made me wonder, could I get one of the LLMs to produce useful NoFlo components? Armed with New Bing, I set out to explore.
The first attempt was specifying a pretty simple component:
That actually looks quite reasonable! I also tried asking New Bing to make the component less verbose, as well as generating TypeScript and CoffeeScript variants of the same. All seemed to produce workable things! Sure, there might be some tidying to do, but this could remove a lot of the tedium of component creation.
In addition to this trivial math component I was able to generate some that to call external REST APIs etc. Bing was even able to switch between HTTP libraries as requested.
What was even cooler was that it actually suggested to ask it how to test the component. Doing as I was told, the result was quite astonishing:
That’s fbp-spec! The declarative testing tool we came up with! Definitely the nicest way to test NoFlo (or any other FBP framework) components.
Based on my results, you’ll definitely want to check the generated components and tests before running them. But what you get out is not bad at all.
I of course also tried to get Bing to produce NoFlo graphs for me. This is where it stumbled quite a bit. Interestingly the results were better in the fbp language than in the JSON graph format. But maybe that even more enforces that the sweet spot would be AI writing components and a human creating the graphs that run those.
As I’m not working at the moment, I don’t have a current use case for this way of collaborating. But I believe this could be a huge productivity booster for any (and especially Flow-Based) application development, and expect to try it in whatever my next gig ends up being.
Illustrations: MidJourney, from prompt Robot software developer working with a software architect. Floating flowcharts in the background
Sunday, 19 March 2023
Monospace considered harmful
No, I haven’t gone completely mad yet and still, I write this as an appeal to stop using monospaced fonts for code (conditions may apply). While fixed-width fonts have undeniable benefits when authoring software, their use is excessive and even detrimental in certain contexts. Specifically, when displaying inline code within a paragraph of text, proportional fonts are a better choice.
The downsides
Fixed-width fonts for inline code have a handful of downsides. Firstly, text set in such font takes up more space and, depending on the font pairing, individual letters may appear larger. This creates unbalanced look and opportunities for awkward line wrapping.
Moreover, a fixed-width typeface has been shown to be slower to read. Even disregarding the speed differences, switching between two drastically different types of font isn’t comfortable.
To make matters worse, many websites apply too many styles to inline code fragments. For example GitHub and GitLab (i) change the font, (ii) decrease its size, (iii) add background and (iv) add padding. This overemphasis detracts from the content rather than enhancing it.
A better way
A better approach is using serif (sans-serif) font for the main text and a sans-serif (serif) font for inline code†. Or if serif’s aren’t one’s cup of tea, even within the same font group a pairing allowing for clear differentiation between the main text and the code is possible. For example a humanist font paired with a complementary geometric font.
Another option is to format code with a different colour. To avoid using it as the only mean of conveying information, a subtle colour change may be used in conjunction with font change. This is the approach I’ve taken on this blog‡.
It’s also worth considering whether inline code even needs any kind of style change. For example, the sentence ‘Execution of a C program starts from the main function’ is perfectly understandable whether or not ‘main’ is styled differently.
Epilogue
What about code blocks? Using proportional typefaces for them can be done with some care. Indentation isn’t a major problem but some alignment may need adjustments. Depending on the type of code listings, it may be an option. Having said that, I don’t claim this as the only correct option for web publishing.
As an aside, what’s the deal with parenthesise after a function name? To demonstrate, lets reuse an earlier example: ‘Execution of a C program starts from the main()
function’. The brackets aren’t part of the function name and unless they are used to disambiguate between multiple overloaded functions, there’s no need for them.
To conclude, while fixed-width fonts have their place when writing code, their use in displaying inline code is often unnecessary. Using a complementary pairing of proportional typefaces is a better options that can enhance readability. Changing background of inline code is virtually never a good idea.
† Using serif faces on websites used to carry risk of aliasing reducing legibility. Thankfully, the rise of high DPI displays largely alleviated those concerns.
‡ Combining colour change and typeface change breaks principle of using small style changes. Nonetheless, I believe some leniency for websites is in order. It’s not always guaranteed that readers will see fonts author has chosen making colour change kind of a backup. Furthermore, compared to books, change in colour isn’t as focus-grabbing on the Internet.
Friday, 10 March 2023
KDE Gear 23.04 branches created
Make sure you commit anything you want to end up in the KDE Gear 23.04 releases to them
We're already past the dependency freeze.
The Feature Freeze and Beta is next week Thursday 16 of March.
More interesting dates
March 30: 23.04 RC (23.03.90) Tagging and Release
April 13: 23.04 Tagging
April 20: 23.04 Release
https://community.kde.org/Schedules/KDE_Gear_23.04_Schedule
Wednesday, 08 March 2023
Send you talks for Akademy 2023 *now*!
Call for proposal ends Thursday the 30th of March
There's still a few weeks, but time is really running out.
I'm sure there's lots of interesting things you have to talk about Qt, KDE, C++, Community Management or other million things so head over to https://akademy.kde.org/2023/cfp/ or over to https://conf.kde.org/event/5/abstracts/ if you want to skip the nicely worded page that encourages you to submit a talk :)
Monday, 06 March 2023
Keeping a semi-automatic electronic ship's logbook
Maintaining a proper ship’s logbook is something that most boats should do, for practical, as well as legal and traditional reasons. The logbook can serve as a record of proper maintenance and operation of the vessel, which is potentially useful when selling the boat or handling an insurance claim. It can be a fun record of journeys made to look back to. And it can be a crucial aid for getting home if the ship’s electronics or GNSS get disrupted.
Like probably most operators of a small boat, on Lille Ø our logbook practices have been quite varying. We’ve been good at recording engine maintenance, as well as keeping the traditional navigation log while offshore. But in the more hectic pace of coastal cruising or daysailing this has often fallen on the wayside. And as such, a lot of the events and history of the boat is unavailable.
To redeem this I’ve developed signalk-logbook, a semi-automatic electronic logbook for vessels running the Signal K marine data server.
This allows logbook entries to be produced both manually and automatically. The can be viewed and edited using any web-capable device on board, meaning that you can write a log entry on your phone, and maybe later analyse and print them on your laptop.
Why Signal K
Signal K is a marine data server that has integrations with almost any relevant marine electronics system. If you have an older NMEA0183 or Seatalk system, Signal K can communicate with it. Same with NMEA2000. If you already have your navigational data on the boat WiFi, Signal K can use and enrich it.
This means that by making the logbook a Signal K plugin, I didn’t have to do any work to make it work with existing boat systems. Signal K even provides a user interface framework.
This means that to make the electronic logbook happen, I only had to produce some plugin JavaScript, and then build a user interface. As I don’t do front-end development that frequently, this gave me a chance to dive into modern React with hooks for the first time. What better to do after being laid off?
Signal K also has very good integration with Influx and Grafana. These can record vessel telemetry in a high resolution. So why bother with a logbook on the side? In my view, a separate logbook is still valuable for storing the comments and observations not available in a marine sensor network. It can also be a lot more durable and archivable than a time series database. On Lille Ø we run both.
User interface
The signalk-logbook comes with a reasonably simple web-based user interface that is integrated in the Signal K administration UI. You can find it in Web apps
→ Logbook
.
The primary view is a timeline. Sort of “Twitter for your boat” kind of view that allows quick browsing of entries on both desktop and mobile.
There is also the more traditional tabular view, best utilized on bigger screens:
While the system can produce a lot of the entries automatically, it is also easy to create manual entries:
These entries can also include weather observations. Those using celestial navigation can also record manual fixes with these entries! Entries can be categorized to separate things like navigational entries from radio or maintenance logs.
If you have the sailsconfiguration plugin installed, you can also log sail changes in a machine-readable format:
Since the log format is machine readable, the map view allows browsing entries spatially:
Electronic vs. paper
The big benefits of an electronic logbook are automation and availability. The logbook can create entries by itself based on what’s happening with the vessel telemetry. You can read and create log entries anywhere on the boat, using the electronic devices you carry with you. Off-vessel backups are also both possible, and quite easy, assuming that the vessel has a reasonably constant Internet connection.
With paper logbooks, the main benefit is that they’re fully independent of the vessel’s electronic system. In case of power failure, you can still see the last recoded position, heading, etc. They are also a lot more durable in the sense that paper logbooks from centuries ago are still fully readable. Though obviously that carries a strong survivorship bias. I would guess the vast majority of logbooks, especially on smaller non-commercial vessels, don’t survive more than a couple of years.
So, how to benefit from the positive aspects of electronic logbooks, while reducing the negatives when compared to paper? Here are some ideas:
- Mark your position on a paper chart. Even though most boats navigate with only electronic charts, it is a good idea to have at least a planning chart available on paper. When offshore, plot your hourly or daily position on it. This will produce the navigation aid of last resort if all electronics fail. And marked charts are pretty!
- Have an off-vessel backup of your electronic logs. The signalk-logbook uses a very simple plain text format for its entries exactly for this reason. The logs are easy to back up, and can also be utilized without the software itself. This means that with a bit of care your log entries shouls stay readable for many many years to come. On Lille Ø we store them on GitHub
- Print your logs. While this is not something I’m planning to do personally, it would be possible to print your log entries periodically, maybe daily or after each trip. Then you can have an archival copy that doesn’t rely on electronics
API
In addition to providing a web-based user interface, signalk-logbook provides a REST API. This allows software developers to create new integrations with the logbook. For example, these could include:
- Automations to generate log entries for some events via node-red or NoFlo
- Copying the log entries to a cloud service
- Exporting the logs to another format, like GPX or a spreadsheet
- Other, maybe non-web-based user interfaces for browsing and creating log entries
Getting started
To utilize this electronic logbook, you need a working installation of Signal K on your boat. The common way to do this is by having a Raspberry Pi powered by the boat’s electrical system and connected to the various on-board instruments.
There are some nice solutions for this:
- Sailor Hat for Raspberry Pi allows powering a Raspberry Pi from the boat’s 12V system. It also handles shutdowns in a clean way, protecting the memory card from data corruption
- Pican-M both connects a Raspberry Pi to a NMEA2000 bus, and powers it through that
You can of course also do a more custom setup, like we did on our old boat, Curiosity.
For the actual software setup, marinepi-provisioning gives a nice Ansible playbook for getting everything going. Bareboat Necessities is a “Marine OS for Raspberry Pi” that comes with everything included.
If you have a Victron GX device (for example Cerbo GX), you can also install Signal K on that.
Once Signal K is running, just look up signalk-logbook
in the Signal K app store. You’ll also want to install the signalk-autostate
and sailsconfiguration
plugins to enable some of the automations.
Then just restart Signal K, log in, and start logging!
Sunday, 26 February 2023
Inferno on Microcontrollers
Last year I looked at microcontrollers a fair amount, though you probably wouldn't see much activity related to that if you follow these updates. If there was any obvious public activity at all, it was happening in the diary I occasionally update about Inferno-related things. Things went off-track when some initial explorations of Inferno on a Thumb-2 microcontroller ended in frustration, leading me back to writing bare metal code instead, and this indirectly resulted in an organiser application written in a simple language for the Ben NanoNote. As the year drew to a close, I picked up Inferno again and started to make some progress, resulting in something more concrete to show.
Meanwhile, long-time Inferno explorer, Caerwyn, has been investigating a port to the Raspberry Pi Pico. Last year also saw the publication of a thesis about porting Inferno OS to a Cortex-M7 device. Hopefully there are other ports in progress, or there will be new ones that develop once people find out about these efforts.
While it's a challenge to get a useful system running on these resource-constrained devices, it's a rewarding experience to be able to show something that works on at least a basic level. It's also interesting to see a Unix-like system running on something that might be expected to only run low-level, bare metal code.
Categories: Inferno, Limbo, Free Software
Wednesday, 15 February 2023
Send you talks for Linux App Summit 2023 *now*!
Call for proposal ends this Saturday 18th of February.
I'm sure there's lots of interesting things you have to talk about so head over to https://linuxappsummit.org/cfp/ and press the "Submit your talk" button :)
Tuesday, 14 February 2023
I love Free Software – and Free Hardware Designs
For many years I have been using free software. I remember that one of my first GNU programs that I used was a chess game, ported to 16bit Windows. Many years later I switched to GNU/Linux and started programming myself, and also releasing my software under strong copyleft licences. I also discovered that many popular distros of GNU/Linux include non-free firmware. So I began contributing to GNU Guix, a fully free distro of the GNU System that excludes nonfree firmware blobs, nonfree games, and any other nonfree software.
Unfortunately many hardware vendors, including AMD, NVIDIA and Intel starting making their hardware Defective By Design, by implementing HDCP, a kind of hardware-level Digital Restrictions Management. Even if you never watch Netflix, you will be restricted by the non-free firmware, required to use their CPUs and GPUs. If we want to eliminate that form of hardware-level DRM, we will have to design our own Freedom-Respecting hardware. A few years after I baught my Talos II, I began contributing to the Libre-SOC project.
After switching to the POWER9, it was clear that I would not be able to play the nonfree DRM’d games that Valve distributes on their platform Steam. And I didn’d want to either. So I started porting existing free software games to the ppc64el architecture, including VR games such as V-Sekai and BeepSaber. I discovered that there was a libre-licensed SteamVR clone called libsurvive that implements libre licensed lighthouse-based tracking. So I baught my Valve Index, installed libsurvive and started playing with Godot4.
Today is æ„› Free Software Day 2023, which aims at raising awareness to Free Software and the passionate, hard-working people behind it. So I want to thank Luke Kenneth Casson Leighton who started the Libre-SOC project and Charles Lohr for their work on libsurvive. Last year the FSFE had an event dedicated to Free Software games, where we played Veloren, a libre licenced voxel game. The game was really fun, so I want to show my appreciation for their work. The same is true for SlimeVR/monado and Yosys/nextpnr.
Friday, 10 February 2023
Considering Unexplored Products of the Past: Formulating a Product
Previously, I described exploring the matter of developing emulation of a serial port, along with the necessary circuitry, for Elkulator, an emulator for the Acorn Electron microcomputer, motivated by a need to provide a way of transferring files into and out of the emulated computer. During this exploration, I had discovered some existing software that had been developed to provide some level of serial “filing system” support on the BBC Microcomputer – the higher-specification sibling of the Electron – with the development of this software having been motivated by an unforeseen need to transfer software to a computer without any attached storage devices.
This existing serial filing system software was a good indication that serial communications could provide the basis of a storage medium. But instead of starting from a predicament involving computers without usable storage facilities, where an unforeseen need motivates the development of a clever workaround, I wanted to consider what such a system might have been like if there had been a deliberate plan from the very beginning to deploy computers that would rely on a serial connection for all their storage needs. Instead of having an implementation of the filing system in RAM, one could have the luxury of putting it into a ROM chip that would be fitted in the computer or in an expansion, and a richer set of features might then be contemplated.
A Smarter Terminal
Once again, my interest in the historical aspects of the technology provided some guidance and some inspiration. When microcomputers started to become popular and businesses and institutions had to decide whether these new products had any relevance to their operations, there was some uncertainty about whether such products were capable enough to be useful or whether they were a distraction from the facilities already available in such organisations. It seems like a lifetime ago now, but having a computer on every desk was not necessarily seen as a guarantee of enhanced productivity, particularly if they did not link up to existing facilities or did not coordinate the work of a number of individuals.
At the start of the 1980s, equipping an office with a computer on every desk and equipping every computer with a storage solution was an expensive exercise. Even disk drives offering only a hundred kilobytes of storage on each removable floppy disk were expensive, and hard disk drives were an especially expensive and precious luxury that were best shared between many users. Some microcomputers were marketed as multi-user systems, encouraging purchasers to connect terminals to them and to share those precious resources: precisely the kind of thing that had been done with minicomputers and mainframes. Such trends continued into the mid-1980s, manifested by products promoted by companies with mainframe origins, such companies perpetuating entrenched tendencies to frame computing solutions in certain ways.
Terminals themselves were really just microcomputers designed for the sole purpose of interacting with a “host” computer, and institutions already operating mainframes and minicomputers would have experienced the need to purchase several of them. Until competition intensified in the terminal industry, such products were not particularly cheap, with the DEC VT220 introduced in 1983 costing $1295 at its introduction. Meanwhile, interest in microcomputers and the possibility of distributing some kinds of computing activity to these new products, led to experimentation in some organisations. Some terminal manufacturers responded by offering terminals that also ran microcomputer software.
Much of the popular history of microcomputing, familiar to anyone who follows such topics online, particularly through YouTube videos, focuses on adoption of such technology in the home, with an inevitable near-obsession with gaming. The popular history of institutional adoption often focuses on the upgrade parade from one generation of computer to the next. But there is a lesser told history involving the experimentation that took place at the intersection of microcomputing and minicomputing or mainframe computing. In universities, computers like the BBC Micro were apparently informally introduced as terminals for other systems, terminal ROMs were developed and shared between institutions. However, there seems to have been relatively little mainstream interest in such software as fully promoted commercial products, although Acornsoft – Acorn’s software outlet – did adopt such a ROM to sell as their Termulator product.
The Acorn Electron, introduced at £199, had a “proper” keyboard and the ability to display 80 columns of text, unlike various other popular microcomputers. Indeed, it may have been the lowest-priced computer to be able to display 80 columns of relatively high definition text as standard, such capabilities requiring extra cards for machines like the Apple II and the Commodore 64. Considering the much lower price of such a computer, the ongoing experimentation underway at the time with its sibling machine on alternative terminal solutions, and the generally favourable capabilities of both these machines, it seems slightly baffling that more was not done to pursue opportunities to introduce a form of “intelligent terminal” or “hybrid terminal” product to certain markets.

VIEW in 80 columns on the Acorn Electron.
None of this is to say that institutional users would have been especially enthusiastic. In some institutions, budgets were evidently generous enough that considerable sums of money would be spent acquiring workstations that were sometimes of questionable value. But in others, the opportunity to make savings, to explore other ways of working, and perhaps also to explicitly introduce microcomputing topics such as software development for lower-specification hardware would have been worthy of some consideration. An Electron with a decent monochrome monitor, like the one provided with the M2105, plus some serial hardware, could have comprised a product sold for perhaps as little as £300.
The Hybrid Terminal
How would a “hybrid terminal” solution work, how might it have been adopted, and what might it have been used for? Through emulation and by taking advantage of the technological continuity in multi-user systems from the 1980s to the present day, we can attempt to answer such questions. Starting with communications technologies familiar in the world of the terminal, we might speculate that a serial connection would be the most appropriate and least disruptive way of interfacing a microcomputer to a multi-user system.
Although multi-user systems, like those produced by Digital Equipment Corporation (DEC), might have offered network connectivity, it is likely that such connectivity was proprietary, expensive in terms of the hardware required, and possibly beyond the interfacing capabilities of most microcomputers. Meanwhile, Acorn’s own low-cost networking solution, Econet, would not have been directly compatible with these much higher-end machines. Acorn’s involvement in network technologies is also more complicated than often portrayed, but as far as Econet is concerned, only much later machines would more conveniently bridge the different realms of Econet and standards-based higher-performance networks.
Moreover, it remains unlikely that operators and suppliers of various multi-user systems would have been enthusiastic about fitting dedicated hardware and installing dedicated software for the purpose of having such systems communicate with third-party computers using a third-party network technology. I did find it interesting that someone had also adapted Acorn’s network filing system that usually runs over Econet to work instead over a serial connection, which presumably serves files out of a particular user account. Another discovery I made was a serial filing system approach by someone who had worked at Acorn who wanted to transfer files between a BBC Micro system and a Unix machine, confirming that such functionality was worth pursuing. (And there is also a rather more complicated approach involving more exotic Acorn technology.)
Indeed, to be successful, a hybrid terminal approach would have to accommodate existing practices and conventions as far as might be feasible in order to not burden or disturb the operators of these existing systems. One motivation from an individual user’s perspective might be to justify introducing a computer on their desk, to be able to have it take advantage of the existing facilities, and to augment those facilities where it might be felt that they are not flexible or agile enough. Such users might request help from the operators, but the aim would be to avoid introducing more support hassles, which would easily arise if introducing a new kind of network to the mix. Those operators would want to be able to deploy something and have it perform a role without too much extra thought.
I considered how a serial link solution might achieve this. An existing terminal would be connected to, say, a Unix machine and be expected to behave like a normal client, allowing the user to log into their account. The microcomputer would send some characters down the serial line to the Unix “host”, causing it to present the usual login prompt, and the user would then log in as normal. They would then have the option of conducting an interactive session, making their computer like a conventional terminal, but there would also be the option of having the Unix system sit in the background, providing other facilities on request.

Logging into a remote service via a serial connection.
The principal candidates for these other facilities would be file storage and printing. Both of these things were centrally managed in institutions, often available via the main computing service, and the extensible operating system of the Electron and related microcomputers invites the development of software to integrate the core support for these facilities with such existing infrastructure. Files would be loaded from the user’s account on the multi-user system and saved back there again. Printing would spool the printed data to files somewhere in the user’s home directory for queuing to centralised printing services.
Attempting an Implementation
I wanted to see how such a “serial computing environment” would work in practice, how it would behave, what kinds of applications might benefit, and what kind of annoyances it might have. After all, it might be an interesting idea or a fun idea, but it need not be a particularly good one. The first obstacle was that of understanding how the software elements would work, primarily on the Electron itself, from the tasks that I would want the software to perform down to the way the functionality would be implemented. On the host or remote system, I was rather more convinced that something could be implemented since it would mostly be yet another server program communicating over a stream, with plenty of modern Unix conveniences to assist me along the way.
As it turned out, my investigations began with a trip away from home and the use of a different, and much more constrained, development environment involving an ARM-based netbook. Fortunately, Elkulator and the different compilers and tools worked well enough on that development hardware to make the exercise approachable. Another unusual element was that I was going to mostly rely on the original documentation in the form of the actual paper version of the Acorn Electron Advanced User Guide for information on how to write the software for the Electron. It was enlightening coming back to this book after a few decades for assistance on a specific exercise, even though I have perused the book many times in its revised forms online, because returning to it with a focus on a particular task led me to find that the documentation in the book was often vague or incomplete.
Although the authors were working in a different era and presumably under a degree of time pressure, I feel that the book in some ways exhibits various traits familiar to those of us working in the software industry, these indicating a lack of rigour and of sufficient investment in systems documentation. For this, I mostly blame the company who commissioned the work and then presumably handed over some notes and told the authors to fill in the gaps. As if to strengthen such perceptions of hurriedness and lack of review, it also does not help that “system” is mis-spelled “sysem” in a number of places in the book!
Nevertheless, certain aspects of the book were helpful. The examples, although focusing on one particular use-case, did provide helpful detail in deducing the correct way of using certain mechanisms, even if they elected to avoid the correct way of performing other tasks. Acorn’s documentation had a habit of being “preachy” about proper practices, only to see its closest developers ignore those practices, anyway. Eventually, on returning from my time away, I was able to fill in some of the gaps, although by this time I had a working prototype that was able to do basic things like initiate a session on the host system and to perform some file-related operations.
There were, and still are, a lot of things that needed, and still need, improvement with my implementation. The way that the operating system needs to be extended to provide extra filing system functionality involves plenty of programming interfaces, plenty of things to support, and also plenty of opportunities for things to go wrong. The VIEW word processor makes use of interfaces for both whole-file loading and saving as well as random-access file operations. Missing out support for one or the other will probably not yield the desired level of functionality.
There are also intricacies with regard to switching printing on and off – this typically being done using control characters sent through the output stream – and of “spool” files which capture character output. And filing system ROMs need to be initialised through a series of “service calls”, these being largely documented, but the overall mechanism is left largely undescribed in the documentation. It is difficult enough deciphering the behaviour of the Electron’s operating system today, with all the online guidance available in many forms, so I cannot imagine how difficult it would have been as a third party to effectively develop applications back in the day.
Levels of Simulation
To support the activities of the ROM software in the emulated Electron, I had to develop a server program running on my host computer. As noted above, this was not onerous, especially since I had already written a program to exercise the serial communications and to interact with the emulated serial port. I developed this program further to respond to commands issued by my ROM, performing host operations and returning results. For example, the CAT command produces a “catalogue” of files in a host directory, and so my server program performs a directory listing operation, collects the names of the files, and then sends them over the virtual serial link to the ROM for it to display to the user.
To make the experience somewhat authentic and to approximate to an actual deployment environment, I included a simulation of the login prompt so that the user of the emulated Electron would have to log in first, with the software also having to deal with a logged out (or not yet logged in) condition in a fairly graceful way. To ensure that they are logged in, a user selects the Serial Computing Environment using the *SCE command, this explicitly selecting the serial filing system, and the login dialogue is then presented if the user has not yet logged into the remote host. Once logged in, the ROM software should be able to test for the presence of the command processor that responds to issued commands, only issuing commands if the command processor has signalled its presence.
Although this models a likely deployment environment, I wanted to go a bit further in terms of authenticity, and so I decided to make the command processor a separate program that would be installed in a user account on a Unix machine. The user’s profile script would be set up to run the command processor, so that when they logged in, this program would automatically run and be ready for commands. I was first introduced to such practices in my first workplace where a menu-driven, curses-based program I had written was deployed so that people doing first-line technical support could query the database of an administrative system without needing to be comfortable with the Unix shell environment.
For complete authenticity I would actually want to have the emulated Electron contact a Unix-based system over a physical serial connection, but for now I have settled for an arrangement whereby a pseudoterminal is created to run the login program, with the terminal output presented to the emulator. Instead of seeing a simulated login dialogue, the user now interacts with the host system’s login program, allowing them to log into a real account. At that point, the command processor is invoked by the shell and the user gets back control.

Obtaining a genuine login dialogue from a Unix system.
To prevent problems with certain characters, the command processor configures the terminal to operate in raw mode. Apart from that, it operates mostly as it did when run together with the login simulation which did not have to concern itself with such things as terminals and login programs.
Some Applications
This effort was motivated by the need or desire to be able to access files from within Elkulator, particularly from applications such as VIEW. Naturally, VIEW is really just one example from the many applications available for the Electron, but since it interacts with a range of functionality that this serial computing environment provides, it serves to showcase such functionality fairly well. Indeed, some of the screenshots featured in this and the previous article show VIEW operating on text that was saved and loaded over the serial connection.
Accessing files involves some existing operating system commands, such as *CAT (often abbreviated to *.) to list the catalogue of a storage medium. Since a Unix host supports hierarchical storage, whereas the Electron’s built-in command set only really addresses the needs of a flat storage medium (as provided by various floppy disk filing systems for Electron and BBC Micro), the *DIR command has been introduced from Acorn’s hierarchical filing systems (such as ADFS) to navigate between directories, which is perhaps confusing to anyone familiar with other operating systems, such as the different variants of DOS and their successors.

Using catalogue and directory traversal commands.
VIEW allows documents to be loaded and saved in a number of ways, but as a word processor it also needs to be able to print these documents. This might be done using a printer connected to a parallel port, but it makes a bit more sense to instead allow the serial printer to be selected and for printing to occur over the serial connection. However, it is not sufficient to merely allow the operating system to take over the serial link and to send the printed document, if only because the other side of this link is not a printer! Indeed, the command processor is likely to be waiting for commands and to see the incoming data as ill-formed input.
The chosen solution was to intercept attempts to send characters to a serial printer, buffering them and then sending the buffered data in special commands to the command processor. This in turn would write the printed characters to a “spool” file for each printing session. From there, these files could be sent to an appropriate printer. This would give the user rather more control over printing, allowing them to process the printout with Unix tools, or to select one particular physical printer out of the many potentially available in an organisation. In the VIEW environment, and in the MOS environment generally, there is no built-in list of printers or printer selection dialogue.
Since the kinds of printers anticipated for use with VIEW might well have been rather different from the kinds connected to multi-user systems, it is likely that some processing would be desirable where different text styles and fonts have been employed. Today, projects like PrinterToPDF exist to work with old-style printouts, but it is conceivable that either the “printer driver generator” in the View suite or some postprocessing tool might have been used to produce directly printable output. With unstyled text, however, the printouts are generally readable and usable, as the following excerpt illustrates.
A brief report on the experience of using VIEW as a word processor four decades on. Using VIEW on the Acorn Electron is an interesting experience and a glimpse into the way word processing was once done. Although I am a dedicated user of Vim, I am under no illusions of that program's word processing capabilities: it is deliberately a screen editor based on line editor heritage, and much of its operations are line-oriented. In contrast, VIEW is intended to provide printed output: it presents the user with a ruler showing the page margins and tab stops, and it even saves additional rulers into the stored document in their on-screen representations. Together with its default typewriter-style behaviour of allowing the cursor to be moved into empty space and of overwriting or replacing text, there is a quaint feel to it.
Since VIEW is purely text-based, I can easily imagine converting its formatting codes to work with troff. That would then broaden the output options. Interestingly, the Advanced User Guide was written in VIEW and then sent to a company for typesetting, so perhaps a workflow like this would have been useful for the authors back then.
A major selling point of the Electron was its provision of BBC BASIC as the built-in language. As the BBC Micro had started to become relatively widely adopted in schools across the United Kingdom, a less expensive computer offering this particular dialect of BASIC was attractive to purchasers looking for compatibility with school computers at home. Obviously, there is a need to be able to load and save BASIC programs, and this can be done using the serial connection.

Loading a BASIC program from the Unix host.
Beyond straightforward operations like these, BASIC also provides random-access file operations through various keywords and constructs, utilising the underlying operating system interfaces that invoke filing system operations to perform such work. VIEW also appears to use these operations, so it seems sensible not to ignore them, even if many programmers might have preferred to use bulk transfer operations – the standard load and save – to get data in and out of memory quickly.

A BASIC program reading and showing a file.
Interactions between printing, the operating system’s own spooling support, outputting characters and reading and writing data are tricky. A degree of experimentation was required to make these things work together. In principle, it should be possible to print and spool at the same time, even with output generated by the remote host that has been sent over the serial line for display on the Electron!
Of course, as a hybrid terminal, the exercise would not be complete without terminal functionality. Here, I wanted to avoid going down another rabbit hole and implementing a full terminal emulator, but I still wanted to demonstrate the invocation of a shell on the Unix host and the ability to run commands. To show just another shell session transcript would be rather dull, so here I present the perusal of a Python program to generate control codes that change the text colour on the Electron, along with the program’s effects:

Interaction with the shell featuring multiple text colours.
As a bitmapped terminal, the Electron is capable of much more than this. Although limited to moderate resolutions by the standards of the fanciest graphics terminals even of that era, there are interesting possibilities for Unix programs and scripts to generate graphical output.

A chart generated by a Python program showing workstation performance results.
Sending arbitrary character codes requires a bit of terminal configuration magic so that line feeds do not get translated into other things (the termios manual page is helpful, here, suggesting the ONLCR flag as the culprit), but the challenge, as always, is to discover the piece of the stack of technologies that is working against you. Similar things can be said on the Electron as well, with its own awkward confluence of character codes for output and output control, requiring the character output state to be tracked so that certain values do not get misinterpreted in the wrong context.
Others have investigated terminal connectivity on Acorn’s 8-bit microcomputers and demonstrated other interesting ways of producing graphical output from Unix programs. Acornsoft’s Termulator could even emulate a Tektronix 4010 graphical terminal. Curiously, Termulator also supported file transfer between a BBC Micro and the host machine, although only as a dedicated mode and limited to ASCII-only text files, leaving the hybrid terminal concept unexplored.
Reflections and Remarks
I embarked on this exercise with some cautiousness, knowing that plenty of uncertainties lay ahead in implementing a functional piece of software, and there were plenty of frustrating moments as some of the different elements of the rather underdocumented software stack conspired to produce undesirable behaviour. In addition, the behaviour of my serial emulation code had a confounding influence, requiring some low-level debugging (tracing execution within the emulator instruction by instruction, noting the state of the emulated CPU), some slowly dawning realisations, and some adjustments to hopefully make it work in a more cooperative fashion.
There are several areas of potential improvement. I first programmed in 6502 assembly language maybe thirty-five years ago, and although I managed to get some sprite and scrolling routines working, I never wrote any large programs, nor had to interact with the operating system frameworks. I personally find the 6502 primitive, rigid, and not particularly conducive to higher-level programming techniques, and I found myself writing some macros to take away the tedium of shuffling values between registers and the stack, constantly aware of various pitfalls with regard to corrupting registers.
My routines extending the operating system framework possibly do not do things the right way or misunderstand some details. That, I will blame on the vague documentation as well as any mistakes made micromanaging the registers. Particularly frustrating was the way that my ROM code would be called with interrupts disabled in certain cases. This made implementation challenging when my routines needed to communicate over the serial connection, when such communication itself requires interrupts to be enabled. Quite what the intention of the MOS designers was in such circumstances remains something of a mystery. While writing this article, I realised that I could have implemented the printing functionality in a different way, and this might have simplified things, right up to the point where I saw, thanks to the debugger provided by Elkulator, that the routines involved are called – surprise! – with interrupts disabled.
Performance could be a lot better, with this partly due to my own code undoubtedly requiring optimisation. The existing software stack is probably optimised to a reasonable extent, but there are various persistent background activities that probably steal CPU cycles unnecessarily. One unfortunate contributor to performance limitations is the hardware architecture of the Electron. Indeed, I discovered while testing in one of the 80-column display modes that serial transfers were not reliable at the default transfer rate of 9600 baud, instead needing to be slowed down to only 2400 baud. Some diagnosis confirmed that the software was not reading the data from the serial chip quickly enough, causing an overflow condition and data being lost.
Motivated by cost reduction and product positioning considerations – the desire to avoid introducing a product that might negatively affect BBC Micro sales – the Electron was deliberately designed to use a narrow data bus to fewer RAM chips than otherwise would have been used, with a seemingly clever technique being employed to allow the video circuitry to get the data at the desired rate to produce a high-resolution or high-bandwidth display. Unfortunately, the adoption of the narrow data bus, facilitated by the adoption of this particular technique, meant that the CPU could only ever access RAM at half its rated speed. And with the narrow data bus, the video circuitry effectively halts the CPU altogether for a substantial portion of its time in high-bandwidth display modes. Since serial communications handling relies on the delivery and handling of interrupts, if the CPU is effectively blocked from responding quickly enough, it can quickly fall behind if the data is arriving and the interrupts are occurring too often.
That does raise the issue of reliability and of error correction techniques. Admittedly, this work relies on a reliable connection between the emulated Electron and the host. Some measures are taken to improve the robustness of the communication when messages are interrupted so that the host in particular is not left trying to send or receive large volumes of data that are no longer welcome or available, and other measures are taken to prevent misinterpretation of stray data received in a different and thus inappropriate context. I imagine that I may have reinvented the wheel badly here, but these frustrations did provide a level of appreciation of the challenges involved.
Some Broader Thoughts
It is possible that Acorn, having engineered the Electron too aggressively for cost, made the machine less than ideal for the broader range of applications for which it was envisaged. That said, it should have been possible to revise the design and produce a more performant machine. Experiments suggest that a wider data path to RAM would have helped with the general performance of the Electron, but to avoid most of the interrupt handling problems experienced with the kind of application being demonstrated here, the video system would have needed to employ its existing “clever” memory access technique in conjunction with that wider data path so as to be able to share the bandwidth more readily with the CPU.
Contingency plans should have been made to change or upgrade the machine, if that had eventually been deemed necessary, starting at the point in time when the original design compromises were introduced. Such flexibility and forethought would also have made a product with a longer appeal to potential purchasers, as opposed to a product that risked being commercially viable for only a limited period of time. However, it seems that the lessons accompanying such reflections on strategy and product design were rarely learned by Acorn. If lessons were learned, they appear to have reinforced a particular mindset and design culture.
Virtue is often made of the Acorn design philosophy and the sometimes rudely expressed and dismissive views of competing technologies that led the company to develop the ARM processor. This approach enabled comparatively fast and low-cost systems to be delivered by introducing a powerful CPU to do everything in a system from running applications to servicing interrupts for data transfers, striving for maximal utilisation of the available memory bandwidth by keeping the CPU busy. That formula worked well enough at the low end of the market, but when the company tried to move upmarket once again, its products were unable to compete with those of other companies. Ultimately, this sealed the company’s fate, even if more fortuitous developments occurred to keep ARM in the running.
(In the chart shown earlier demonstating graphical terminal output and illustrating workstation performance, circa 1990, Acorn’s R260 workstation is depicted as almost looking competitive until one learns that the other workstations depicted arrived a year earlier and that the red bar showing floating-point performance only applies to Acorn’s machine three years after its launch. It would not be flattering to show the competitors at that point in history, nor would it necessarily be flattering to compare whole-system performance, either, if any publication sufficiently interested in such figures had bothered to do so. There is probably an interesting story to be told about these topics, particularly how Acorn’s floating-point hardware arrived so late, but I doubt that there is the same willingness to tell it as there is to re-tell the usual celebratory story of ARM for the nth time.)
Acorn went on to make the Communicator as a computer that would operate in a kind of network computing environment, relying on network file servers to provide persistent storage. It reused some of the technology in the Electron and the BT Merlin M2105, particularly the same display generator and its narrow data bus to RAM, but ostensibly confining that aspect of the Electron’s architecture to a specialised role, and providing other facilities for applications and, as in the M2105, for interaction with peripherals. Sadly, the group responsible in Acorn had already been marginalised and eventually departed, apparently looking to pursue the concept elsewhere.
As for this particular application of an old computer and a product that was largely left uncontemplated, I think there probably was some mileage in deploying microcomputers in this way, even outside companies like Acorn where such computers were being developed and used, together with software development companies with their own sophisticated needs, where minicomputers like the DEC VAX would have been available for certain corporate or technical functions. Public (or semi-public) access terminals were fairly common in universities, and later microcomputers were also adopted in academia due to their low cost and apparently sufficient capabilities.
Although such adoption appears to have focused on terminal applications, it cannot have been beyond the wit of those involved to consider closer integration between the microcomputing and multi-user environments. In further and higher education, students will have had microcomputing experience and would have been able to leverage their existing skills whilst learning new ones. They might have brought their microcomputers along with them, giving them the opportunity to transfer or migrate their existing content – their notes, essays, programs – to the bright and emerging new world of Unix, as well as updating their expertise.
As for updating my own expertise, it has been an enlightening experience in some ways, and I may well continue to augment the implemented functionality, fix and improve things, and investigate the possibilities this work brings. I hope that this rather lengthy presentation of the effort has provided insights into experiences of the past that was and the past that might have been.
Tuesday, 07 February 2023
Considering Unexplored Products of the Past: Emulating an Expansion
In the last couple of years, possibly in common with quite a few other people, certainly people of my vintage, and undoubtedly those also interested in retrocomputing, I have found myself revisiting certain aspects of my technological past. Fortunately, sites like the Internet Archive make this very easy indeed, allowing us to dive into publications from earlier eras and to dredge up familiar and not so familiar magazine titles and other documentation. And having pursued my retrocomputing interest for a while, participating in forums, watching online videos, even contributing to new software and hardware developments, I have found myself wanting to review some of the beliefs and perceptions that I and other people have had of the companies and products we grew up with.
One of the products of personal interest to me is the computer that got me and my brother started with writing programs (as well as playing games): the Acorn Electron, a product of Acorn Computers of Cambridge in the United Kingdom. Much can be said about the perceived chronology of this product’s development and introduction, the actual chronology, and its impact on its originator and on wider society, but that surely deserves a separate treatment. What I can say is that reviewing the archives and other knowledge available to us now can give a deeper understanding of the processes involved in the development of the Electron, the technological compromises made, and the corporate strategy that led to its creation and eventually its discontinuation.
It has been popular to tell simplistic narratives about Acorn Computers, to reduce its history to a few choice moments as the originator of the BBC Microcomputer and the ARM processor, but to do so is to neglect a richer and far more interesting story, even if the fallibility of some of the heroic and generally successful characters involved may be exposed by telling some of that story. And for those who wonder how differently some aspects of computing history might have turned out, exploring that story and the products involved can be an adventure in itself, filling in the gaps of our prior experiences with new insights, realisations and maybe even glimpses into opportunities missed and what might have been if things had played out differently.
At the Rabbit Hole
Reading about computing history is one thing, but this tale is about actually doing things with old software, emulation, and writing new software. It started off with a discussion about the keyboard shortcuts for a word processor and the differences between the keyboards on the Acorn Electron and its higher-specification predecessor, the BBC Microcomputer. Having acquainted myself with the circuitry of the Electron, how its keyboard is wired up, and how the software accesses it, I was obviously intrigued by these apparent differences, but I was also intrigued by the operation of the word processor in question, Acornsoft’s VIEW.
Back in the day, as people like to refer to the time when these products were first made available, such office or productivity applications were just beyond my experience. Although it was slightly fascinating to read about them, most of my productive time was spent writing programs, mostly trying to write games. I had actually seen an office suite written by Psion on the ACT Sirius 1 in the early 1980s, but word processors were the kind of thing that people used in offices or, at the very least, by people who had a printer so that they could print the inevitable letters that everyone would be needing to write.
Firing up an Acorn Electron emulator, specifically Elkulator, I discovered that one of the participants in the discussion was describing keyboard shortcuts that didn’t match up to those that were described in a magazine article from the era, these appearing correct as I tried them out for myself. It turned out that the discussion participant in question was using the BBC Micro version of VIEW on the Electron and was working around the mismatch in keyboard layouts. Although all of this was much ado about virtually nothing, it did two things. Firstly, it made me finally go in and fix Elkulator’s keyboard configuration dialogue, and secondly, it made me wonder how convenient it would be to explore old software in a productive way in an emulator.
Reconciling Keyboards
Having moved to Norway many years ago now, I use a Norwegian keyboard layout, and this has previously been slightly problematic when using emulators for older machines. Many years ago, I used and even contributed some minor things to another emulator, ElectrEm, which had a nice keyboard configuration dialogue. The Electron’s keyboard corresponds to certain modern keyboards pretty well, at least as far as the alphanumeric keys are concerned. More challenging are the symbols and control-related keys, in particular the Electron’s special Caps Lock/Function key which sits where many people now have their Tab key.
Obviously, there is a need to be able to tell an emulator which keys on a modern keyboard are going to correspond to the keys on the emulated machine. Being derived from an emulator for the BBC Micro, however, Elkulator’s keyboard configuration dialogue merely presented a BBC Micro keyboard on the screen and required the user to guess which “Beeb” key might correspond to an Electron one. Having put up with this situation for some time, I finally decided to fix this once and for all. The process of doing so is not particularly interesting, so I will spare you the details of doing things with the Allegro toolkit and the Elkulator source code, but I was mildly pleased with the result:

The revised keyboard configuration dialogue in Elkulator.
By also adding support for redefining the Break key in a sensible way, I was also finally able to choose a key that desktop environments don’t want to interfere with: F12 might work for Break, but Ctrl-F12 makes KDE/Plasma do something I don’t want, and yet Ctrl-Break is quite an important key combination when using an Electron or BBC Micro. Why Break isn’t a normal key on these machines is another story in itself, but here is an example of redefining it and even allowing multiple keys on a modern keyboard to act as Break on the emulated computer:

Redefining the Break key in Elkulator.
Being able to confidently choose and use keys made it possible to try out VIEW in a more natural way. But this then led to another issue: how might I experiment with such software productively? It would be good to write documents and to be able to extract them from the emulator, rather than see them disappear when the emulator is closed.
Real and Virtual Machines
One way to get text out of a system, whether it is a virtual system like the emulated Electron or a real machine, is to print it. I vaguely remembered some support for printing from Elkulator and was reminded by my brother that he had implemented such support himself a while ago as a quick way of getting data out of the emulated system. But I also wanted to be able to get data into the emulated system as well, and the parallel interface typically used by the printer is not bidirectional on the Electron. So, I would need to look further for a solution.
It is actually the case that Elkulator supports reading from and writing to disk (or disc) images. The unexpanded Electron supports read/write access to cassettes (or tapes), but Elkulator does not support writing to tapes, probably because the usability considerations are rather complicated: one would need to allow the user to control the current position on a tape, and all this would do is to remind everyone how inconvenient tapes are. Meanwhile, writing to disk images would be fairly convenient within the emulator, but then one would need to use tools to access the files within the images outside the emulator.
Some emulators for various systems also support the notion of a host filesystem (or filing system) where some special support has been added to make the emulated machine see another peripheral and to communicate with it, this peripheral really being a program on the host machine (the machine that is running the emulator). I could have just written such support, although it would also have needed some software support written for the emulated machine as well, but this approach would have led me down a path of doing something specific to emulation. And I have a principle of sorts which is that if I am going to change the way an emulated machine behaves, it has to be rooted in some kind of reality and not just enhance the emulated machine in a way that the original, “real” machine could not have been.
Building on Old Foundations
As noted earlier, I have an interest in the way that old products were conceived and the roles for which those products were intended by their originators. The Electron was largely sold as an unexpanded product, offering only power, display and cassette ports, with a general-purpose expansion connector being the gateway to anything else that might have been added to the system later. This was perceived somewhat negatively when the machine was launched because it was anticipated that buyers would probably, at the very least, want to plug joysticks into the Electron to play games. Instead, Acorn offered an expansion unit, the Plus 1, that cost another £60 which provided joystick, printer and cartridge connectors.
But this flexibility in expanding the machine meant that it could have been used as the basis for a fairly diverse range of specialised products. In fact, one of the Acorn founders, Chris Curry, enthused about the Electron as a platform for such products, and one such product did actually make it to market, in a way: the BT Merlin M2105 messaging terminal. This terminal combined the Electron with an expansion unit containing circuitry for communicating over a telephone line, a generic serial communications port, a printer port, as well as speech synthesis circuitry and a substantial amount of read-only memory (ROM) for communications software.
Back in the mid-1980s, telecommunications (or “telecoms”) was the next big thing, and enthusiasm for getting a modem and dialling up some “online” service or other (like Prestel) was prevalent in the computing press. For businesses and institutions, there were some good arguments for adopting such technologies, but for individuals the supposed benefits were rather dulled by the considerable costs of acquiring the hardware, buying subscriptions, and the notoriously high telephone call rates of the era. Only the relatively wealthy or the dedicated few pursued this side of data communications.
The M2105 reportedly did some service in the healthcare sector before being repositioned for commercial applications. Along with its successor product, the Acorn Communicator, it enjoyed a somewhat longer lifespan in certain enterprises. For the standard Electron and its accompanying expansions, support for basic communications capabilities was evidently considered important enough to be incorporated into the software of the Plus 1 expansion unit, even though the Plus 1 did not provide any of the specific hardware capabilities for communication over a serial link or a telephone line.
It was this apparently superfluous software capability that I revisited when I started to think about getting files in and out of the emulator. When emulating an Electron with Plus 1, this serial-capable software is run by the emulator, just as it is by a real Electron. On a real system of this kind, a cartridge could be added that provides a serial port and the necessary accompanying circuitry, and the system would be able to drive that hardware. Indeed, such cartridges were produced decades ago. So, if I could replicate the functionality of a cartridge within the emulator, making some code that pretends to be a serial communications chip (or UART) that has been interfaced to the Electron, then I would in principle be able to set up a virtual serial connection between the emulated Electron and my modern host computer.
Emulated Expansions
Modifying Elkulator to add support for serial communications hardware was fairly straightforward, with only a few complications. Expansion hardware on the Electron is generally accessible via a range of memory addresses that actually signal peripherals as opposed to reading and writing memory. The software provided by the Plus 1 expansion unit is written to expect the serial chip to be accessible via a range of memory locations, with the serial chip accepting values sent to those locations and producing values from those locations on request. The “memory map” through which the chip is exposed in the Electron corresponds directly to the locations or registers in the serial chip – the SCN2681 dual asynchronous receiver/transmitter (DUART) – as described by its datasheet.
In principle, all that is needed is to replicate the functionality described by the datasheet. With this done, the software will drive the chip, the emulated chip will do what is needed, and the illusion will be complete. In practice, a certain level of experimentation is needed to fill in the gaps left by the datasheet and any lack of understanding on the part of the implementer. It did help that the Plus 1 software has been disassembled – some kind of source code regenerated from the binary – so that the details of its operation and its expectations of the serial chip’s operation can be established.
Moreover, it is possible to save a bit of effort by seeing which features of the chip have been left unused. However, some unused features can be provided with barely any extra effort: the software only drives one serial port, but the chip supports two in largely the same way, so we can keep support for two just in case there is a need in future for such capabilities. Maybe someone might make a real serial cartridge with two ports and want to adapt the existing software, and they could at least test that software under emulation before moving to real hardware.
It has to be mentioned that the Electron’s operating system, known as the Machine Operating System or MOS, is effectively extended by the software provided in the Plus 1 unit. Even the unexpanded machine provides the foundations for adding serial communications and printing capabilities in different ways, and the Plus 1 software merely plugs into that framework. A different kind of serial chip would be driven by different software but it would plug into the same framework. At no point does anyone have to replace the MOS with a patched version, which seems to be the kind of thing that happens with some microcomputers from the same era.
Ultimately, what all of this means is that having implemented the emulated serial hardware, useful things can already be done with it within the bare computing environment provided by the MOS. One can set the output stream to use the serial port and have all the text produced by the system and programs sent over the serial connection. One can select the serial port for the input stream and send text to the computer instead of using the keyboard. And printing over the serial connection is also possible by selecting the appropriate printer type using a built-in system command.
In Elkulator, I chose to expose the serial port via a socket connection, with the emulator binding to a Unix domain socket on start-up. I then wrote a simple Python program to monitor the socket, to show any data being sent from the emulator and to send any input from the terminal to the emulator. This permitted the emulated machine to be operated from a kind of remote console and for the emulated machine to be able to print to this console. At last, remote logins are possible on the Electron! Of course, such connectivity was contemplated and incorporated from the earliest days of these products.
Filing Options
If the goal of all of this had been to facilitate transfers to and from the emulated machine, this might have been enough, but a simple serial connection is not especially convenient to use. Although a method of squirting a file into the serial link at the Electron could be made convenient for the host computer, at the other end one has to have a program to do something with that file. And once the data has arrived, would it not be most convenient to be able to save that data as a file? We just end up right back where we started: having some data inside the Electron and nowhere to put it! Of course, we could enable disk emulation and store a file on a virtual disk, but then it might just have been easier to make disk image handling outside the emulator more convenient instead.
It seemed to me that the most elegant solution would be to make the serial link act as the means through which the Electron accesses files. That instead of doing ad-hoc transfers of data, such data would be transferred as part of operations that are deliberately accessing files. Such ambitions are not unrealistic, and here I could draw on my experience with the platform, having acquired the Acorn Electron Advanced User Guide many, many years ago, in which there are details of implementing filing system ROMs. Again, the operating system had been designed to be extended in order to cover future needs, and this was one of them.
In fact, I had not been the only one to consider a serial filing system, and I had been somewhat aware of another project to make software available via a serial link to the BBC Micro. That project had been motivated by the desire to be able to get software onto that computer where no storage devices were otherwise available, even performing some ingenious tricks to transfer the filing system software to the machine and to have that software operate from RAM. It might have been tempting merely to use this existing software with my emulated serial port, to get it working, and then to get back to trying out applications, loading and saving, and to consider my work done. But I had other ideas in mind…
Sunday, 29 January 2023
Artificial intelligence is not willing to be correct
As deep learning models get better at representing human language, telling whether a text was written by a human being or a deep learning model becomes harder and harder. And because language models reproduce text found online (often without attribution); the risk of considering their output as if they were written by a human changes the reading experience for the reader.
The last year has been incredible for natural (and programming) language processing. GitHub’s Copilot has been out of technical preview since June, and ChatGPT was released in November. Copilot is based on OpenAI Codex and acts as a source code generator (which raises several issues of its own). ChatGPT is a language model built for dialogue, where a user can chat with the AI, ask questions and have them answered. Both are trained with data from web scrapping, with source code for Copilot and webpages for ChatGPT. Those models work particularly well for their respective purposes, and can thus be used to generate seemingly convincing source code or prose.
Because AI-generated texts are convincing, the fact that they were generated by an AI is not obvious to the careless reader. This is problematic, as there is no guarantee that the text is factually correct and that the human leveraging the AI checked it for mistakes. When reading, this may create a discomfort, as the reader has to determine whether a text was generated by an AI, and if so, if the publisher made sure that it is correct. Companies already started to use AI generated text for articles without clearly visible disclaimers and riddled with errors. The fact that text generated by ChatGPT may contain inaccuracies was acknowledged by OpenAI’s CEO. One might argue that humans make mistakes, too, and that prose or source code written by a human being can therefore also be wrong. This is true. However, the intent behind the text differs. In most cases, the author of a text tries their best to make it correct. But the language model does not understand the concept of correctness and will happily generate text containing wrong facts, which changes the tacitly assumed rules of writing and reading content.
Gaining trust in the text generated by an AI is thus a worthwhile objective. Here are partial solutions to this:
-
Watermarking texts generated by GPT models is a work in progress. Among the possible ones, the words chosen by the AI would embed a proof (using asymmetric cryptography) in their probability distribution. While this does not alleviate the concern stated above, this allows the reader to avoid AI-generated text if he or she wants to.
-
Connecting the text generated by the AI back to what lead it to generate the text might be another may offer a partial solution. If the readers can verify the trustworthiness of the sources, they might feel more confident about the AI-generated text they are reading.
-
If citing the source is too involved computationally, weighting the learning process of the AI in such a way that would give more importance to the authoritative sources on a subject would be a good workaround. Counting the number of backreferences of a page would be a good indicator of whether the text it contains is authoritative (just like page rank).
Considering this perspective, using large language models raises trust issues. A few technical solutions are listed above. However, it would be too reductive to consider this only as a technical problem. AI generated text then looks akin to search engines, without the comfort of knowing that they merely redirect to a source website, whose content is presumably written by a human being who tried to make it correct.
PS: This article was not written by an AI.
Friday, 13 January 2023
Use Any SOP Binary With SOP-Java and External-SOP
The Stateless OpenPGP Protocol specification describes a shared, standardized command line interface for OpenPGP applications. There is a bunch of such binaries available already, among them PGPainless’ pgpainless-cli, Sequoia-PGP’s sqop, as well as ProtonMails gosop. These tools make it easy to use OpenPGP from the command line, as well as from within bash scripts (all of those are available in Debian testing or in the main repo) and the standardized interface allows users to switch from one backend to the other without the need to rewrite their scripts.
The Java library sop-java provides a set of interface definitions that define a java API that closely mimics the command line interface. These interfaces can be implemented by anyone, such that developers could create a drop-in for sop-java using the OpenPGP library of their choice. One such backend is pgpainless-sop, which implements sop-java using the PGPainless library.
I just released another library named external-sop, which implements sop-java and allows the user to use any SOP CLI application of their choice from within their Java / Kotlin application!
Let’s assume we have a SOP command line application called example-sop
and we want to make use of it within our Java application. external-sop
makes the integration a one-liner:
SOP sop = new ExternalSOP("/usr/bin/example-sop");
Now we can use the resulting sop
object the same way we would use for example a SOPImpl
instance:
// generate key byte[] keyBytes = sop.generateKey() .userId("John Doe <john.doe@pgpainless.org>") .withKeyPassword("f00b4r") .generate() .getBytes(); // extract certificate byte[] certificateBytes = sop.extractCert() .key(keyBytes) .getBytes(); byte[] plaintext = "Hello, World!\n".getBytes(); // plaintext // encrypt and sign a message byte[] ciphertext = sop.encrypt() // encrypt for each recipient .withCert(certificateBytes) // Optionally: Sign the message .signWith(keyBytes) .withKeyPassword("f00b4r") // if signing key is protected // provide the plaintext .plaintext(plaintext) .getBytes(); // decrypt and verify a message ByteArrayAndResult<DecryptionResult> bytesAndResult = sop.decrypt() .withKey(keyBytes) .verifyWithCert(certificateBytes) .withKeyPassword("f00b4r") // if decryption key is protected .ciphertext(ciphertext) .toByteArrayAndResult(); DecryptionResult result = bytesAndResult.getResult(); byte[] plaintext = bytesAndResult.getBytes();
The external-sop
module will be available on Maven Central in a few hours for you to test.
Happy Hacking!
Wednesday, 04 January 2023
The KDE Qt5 Patch Collection has been rebased on top of Qt 5.15.8
Commit: https://invent.kde.org/qt/qt/qt5/-/commit/281044e2541c842f8d0b0bc1a199999bf9d9951c
Commercial release announcement: https://www.qt.io/blog/commercial-lts-qt-5.15.8-released
As usual I want to personally extend my gratitude to the Commercial users of Qt for beta testing Qt 5.15.8 for the rest of us.
The Commercial Qt 5.15.8 release introduced two bugs that have later been fixed. Thanks to that, our Patchset Collection has been able to incorporate the the fix for one of the issues [1] and revert for the other [2] and the Free Software users will never be affected by it!
P.S: Special shout-out to Andreas Sturmlechner for identifying the fix of the issue, since I usually only pay attention to "Revert XYZ" commits and this one is not a revert but subsequent improvement
Planet FSFE (en): RSS 2.0 |
Atom |
FOAF |
Albrechts Blog
Alessandro's blog
Andrea Scarpino's blog
André Ockers on Free Software
Bela's Internship Blog
Bernhard's Blog
Bits from the Basement
Blog of Martin Husovec
Bobulate
Brian Gough’s Notes
Chris Woolfrey — FSFE UK Team Member
Ciarán’s free software notes
Colors of Noise - Entries tagged planetfsfe
Communicating freely
Daniel Martí's blog
David Boddie - Updates (Full Articles)
ENOWITTYNAME
English Planet – Dreierlei
English – Alessandro at FSFE
English – Alina Mierlus – Building the Freedom
English – Being Fellow #952 of FSFE
English – Blog
English – FSFE supporters Vienna
English – Free Software for Privacy and Education
English – Free speech is better than free beer
English – Jelle Hermsen
English – Nicolas Jean's FSFE blog
English – Paul Boddie's Free Software-related blog
English – The Girl Who Wasn't There
English – Thinking out loud
English – Viktor's notes
English – With/in the FSFE
English – gollo's blog
English – mkesper's blog
English – nico.rikken’s blog
Escape to freedom
Evaggelos Balaskas - System Engineer
FSFE interviews its Fellows
FSFE – Frederik Gladhorn (fregl)
FSFE – Matej's blog
Fellowship News
Free Software & Digital Rights Noosphere
Free Software with a Female touch
Free Software –
Free Software – hesa's Weblog
Free as LIBRE
Free software - Carmen Bianca Bakker
Free, Easy and Others
FreeSoftware – egnun's blog
From Out There
Giacomo Poderi
Green Eggs and Ham
Handhelds, Linux and Heroes
HennR’s FSFE blog
Henri Bergius
In English — mina86.com
Karsten on Free Software
Losca
MHO
Mario Fux
Matthias Kirschner's Web log - fsfe
Max Mehl (English)
Michael Clemens
Myriam's blog
Mäh?
Nice blog
Nikos Roussos - opensource
Planet FSFE on irl.xyz
Posts on Hannes Hauswedell
Pressreview
Rekado
Riccardo (ruphy) Iaconelli – blog
Saint’s Log
TSDgeos' blog
Tarin Gamberini
Technology – Intuitionistically Uncertain
The trunk
Thomas Løcke Being Incoherent
Told to blog - Entries tagged fsfe
Tonnerre Lombard
Vincent Lequertier's blog
Vitaly Repin. Software engineer's blog
Weblog
Weblog
Weblog
Weblog
Weblog
Weblog
a fellowship ahead
agger's Free Software blog
anna.morris's blog
ayers's blog
bb's blog
blog
en – Florian Snows Blog
en – PB's blog
en – rieper|blog
english on Björn Schießle - I came for the code but stayed for the freedom
english – Davide Giunchi
english – Torsten's FSFE blog
foss – vanitasvitae's blog
free software blog
freedom bits
freesoftware – drdanzs blog
fsfe – Thib's Fellowship Blog
julia.e.klein’s blog
marc0s on Free Software
pichel’s blog
planet-en – /var/log/fsfe/flx
polina's blog
softmetz' anglophone Free Software blog
stargrave's blog
tobias_platen's blog
tolld's blog
wkossen’s blog
yahuxo’s blog