Free Software, Free Society!
Thoughts of the FSFE Community (English)

## Integrating libext2fs with a Filesystem Framework

Given the content covered by my previous articles, there probably doesn’t seem to be too much that needs saying about the topic covered by this article. Previously, I described the work involved in building libext2fs for L4Re and testing the library, and I described a framework for separating filesystem providers from programs that want to use files. But, as always, there are plenty of little details, detours and learning experiences that help to make the tale longer than it otherwise might have been.

Although this file access framework sounds intimidating, it is always worth remembering that the only exotic thing about the software being written is that it needs to request system resources and to communicate with other programs. That can be tricky in itself in many programming environments, and I have certainly spent enough time trying to figure out how to use the types and functions provided by the many L4Re libraries so that these operations may actually work.

But in the end, these are programs that are run just like any other. We aren’t building things into the kernel and having to conform to a particularly restricted environment. And although it can still be tiresome to have to debug things, particularly interprocess communication (IPC) problems, many familiar techniques for debugging and inspecting program behaviour remain available to us.

## A Quick Translation

The test program I had written for libext2fs simply opened a file located in the “rom” filesystem, exposed it to libext2fs, and performed operations to extract content. In my framework, I had directed my attention towards opening and reading files, so it made sense to concentrate on providing this functionality in a filesystem server or “provider”.

Accessing a filesystem server employing a "rom" file for the data

The user of the framework (shielded from the details by a client library) would request the opening of a file (thus obtaining a file descriptor able to communicate with a dedicated resource object) and then read from the file (causing communication with the resource object and some transfers of data). These operations, previously done in a single program employing libext2fs directly, would now require collaboration by two separate programs.

So, I would need to insert the appropriate code in the right places in my filesystem server and its objects to open a filesystem, search for a file of the given name, and to provide the file data. For the first of these, the test program was doing something like this in the main function:

retval = ext2fs_open(devname, EXT2_FLAG_RW, 0, 0, unix_io_manager, &fs);

In the main function of the filesystem server program, something similar needs to be done. A reference to the filesystem (fs) is then passed to the server object for it to use:

Fs_server server_obj(fs, devname);

When a request is made to open a file, the filesystem server needs to locate the file just as the test program needed to. The code to achieve this is tedious, employing the ext2fs_lookup function and traversing the directory hierarchy. Ultimately, something like this needs to be done to obtain a structure for accessing the file contents:

retval = ext2fs_file_open(_fs, ino_file, ext2flags, &file);

Here, the _fs variable is our reference in the server object to the filesystem structure, the ino_file variable refers to the place in the filesystem where the file is found (the inode), some flags indicate things like whether we are reading and/or writing, and a supplied file variable is set upon the successful opening of the file. In the filesystem server, we want to create a specific object to conduct access to the file:

Fs_object *obj = new Fs_object(file, EXT2_I_SIZE(&inode_file), fsobj, irq);

Here, this resource object is initialised with the file access structure, an indication of the file size, something encapsulating the state of the communication between client and server, and the IRQ object needed for cleaning up (as described in the last article). Meanwhile, in the resource object, the read operation is supported by a pair of libext2fs functions:

ext2fs_file_lseek(_file, _obj.position, EXT2_SEEK_SET, 0);
ext2fs_file_read(_file, _obj.buffer, to_transfer, &read);

These don’t appear next to each other in the actual code, but the first call is used to seek to the indicated position in the file, this having been specified by the client. The second call appears in a loop to read into a buffer an indicated amount of data, returning the amount that was actually read.

In summary, the work done by a collection of function calls appearing together in a single function is now spread out over three places in the filesystem server program:

• The initialisation is done in the main function as the server starts up
• The locating and opening of a file in the filesystem is done in the general filesystem server object
• Reading and writing is done in the file-specific resource object

After initialisation, the performance of each part of the work only occurs upon receiving a distinct kind of message from a client program, of which more details are given below.

## The Client Library

Although we cannot yet use the familiar C library functions for accessing files (fopen, fread, fwrite, fclose, and so on), we can employ functions that try to be as friendly. Thus, the following form of program may be used:

char buffer[80];
file_descriptor_t *desc = client_open("test.txt", O_RDONLY);

if (available)
fwrite((void *) buffer, sizeof(char), available, stdout); /* using existing fwrite function */
client_close(desc);

As noted above, the existing fwrite function in L4Re may be used to write file data out to the console. Ultimately, we would want our modified version of the function to be doing this job.

These client library functions resemble lower-level C library functions such as open, read, write, close, and so on. By targeting this particular level of functionality, it is hoped that much of the logic in functions like fopen can be preserved, this logic having to deal with things like mode strings (“r”, “r+”, “w”, and so on) which have little to do with the actual job of transmitting file content around the system.

In order to do their work, the client library functions need to send and receive IPC messages, or at least need to get other functions to deal with this particular work. My approach has been to write a layer of functions that only deals with messaging and that hides the L4-specific details from the rest of the code.

This lower-level layer of functions allows us to treat interprocess interactions like normal function calls, and in this framework those calls would have the following signatures, with the inputs arriving at the server and the outputs arriving back at the client:

• fs_open: flags, buffer → file size, resource object
• fs_flush: (no parameters) → (no return values)
• fs_write: position, available → written, file size
Here, the aim is to keep the interprocess interactions as simple and as infrequent as possible, with data buffered in the indicated buffer dataspace, and with reading and writing only occurring when the buffer is read or has been filled by writing. The more friendly semantics therefore need to be supported in the client library functions resting on top of these even-lower-level IPC messaging functions.

The responsibilities of the client library functions can be summarised as follows:

• client_open: allocate memory for the buffer, obtain a server reference (“capability”) from the program’s environment
• client_close: deallocate the allocated resources
• client_flush: invoke fs_flush with any available data, resetting the buffer status
• client_read: provide data to the caller from its buffer, invoking fs_read whenever the buffer is empty
• client_write: commit data from the caller into the buffer, invoking fs_write whenever the buffer is full, also flushing the buffer when appropriate

The lack of a fs_close function might seem surprising, but as described in the previous article, the server process is designed to receive a notification when the client process discards a reference to the resource object dedicated to a particular file. So in client_close, we should be able to merely throw away the things acquired by client_open, and the system together with the server will hopefully handle the consequences.

## Switching the Backend

Using a conventional file as the repository for file content is convenient, but since the aim is to replace the existing filesystem mechanisms, it would seem necessary to try and get libext2fs to use other ways of accessing the underlying storage. Previously, my considerations had led me to provide a “block” storage layer underneath the filesystem layer. So it made sense to investigate how libext2fs might communicate with a “block server” or “block device” in order to read and write raw filesystem data.

Employing a separate server to provide filesystem data

Changing the way libext2fs accesses its storage sounds like an ominous task, but fortunately some thought has evidently gone into accommodating different storage types and platforms. Indeed, the library code includes support for things like DOS and Windows, with this functionality evidently being used by various applications on those platforms (or, these days, the latter one, at least) to provide some kind of file browser support for ext2-family filesystems.

The kind of component involved in providing this variety of support is known as an “I/O manager”, and the one that we have been using is known as the “Unix” I/O manager, this employing POSIX or standard C library calls to access files and devices. Now, this may have been adequate until now, but with the requirement that we use the replacement IPC mechanisms to access a block server, we need to consider how a different kind of I/O manager might be implemented to use the client library functions instead of the C library functions.

This exercise turned out to be relatively straightforward and perhaps a little less work than envisaged once the requirements of initialising an io_channel object had been understood, this involving the allocation of memory and the population of a structure to indicate things like the block size, error status, and so on. Beyond this, the principal operations needing support are as follows:

• open: initialises the io_channel and calls client_open
• close: calls client_close
• set block size: sets the block size for transfers, something that gets done at various points in the opening of a filesystem
• read block: calls client_seek and client_read to obtain data from the block server
• write block: calls client_seek and client_write to commit data to the block server

It should be noted that the block server largely acts like a single-file filesystem, so the same interface supported by the filesystem server is also supported by the block server. This is how we get away with using the client libraries.

Meanwhile, in the filesystem server code, the only changes required are to declare the new I/O manager, implemented in a separate library package, and to use it instead of the previous one:

retval = ext2fs_open(devname, ext2flags, 0, 0, blockserver_io_manager, &fs);

## The Final Trick

By pushing use of the “rom” filesystem further down in the system, use of the new file access mechanisms can be introduced and tested, with the only “unauthentic” aspect of the arrangement being that a parallel set of file access functions is being used instead of the conventional ones. The only thing left to do would be to change the C library to incorporate the new style of file access, probably by incorporating the client library internally, thus switching the C library away from its previous method of accessing files.

With the conventional file abstractions reimplemented, access to files would go via the virtual filesystem and hopefully end up encountering block devices that are able to serve up the needed data directly. And ultimately, we could end up switching back to using the Unix I/O manager with libext2fs.

Introducing the new IPC mechanisms at the C library level

Changing things so drastically would also force us to think about maintaining access to the “rom” filesystem through the revised architecture, at least at first, because it happens to provide a very convenient way of getting access to data for use as storage. We could try and implement storage hardware support in order to get round this problem, but that probably isn’t convenient – or would be a distraction – when running L4Re on Fiasco.OC-UX as a kind of hosted version of the software.

Indeed, tackling the C library is probably too much of a challenge at this early stage. Fortunately, there are plenty of other issues to be considered first, with the use of non-standard file access functions being only a minor inconvenience in the broader scheme of things. For instance, how are permissions and user identities to be managed? What about concurrent access to the filesystem? And what mechanisms would need to be provided for grafting filesystems onto a larger virtual filesystem hierarchy? I hope to try and discuss some of these things in future articles.

## Setting up Tor hidden service

Anyone can think of myriad reasons to run a Tor hidden service. Surely many unsavoury endeavours spring to mind but of course there are as many noble ones. There are also various pragmatic causes like circumventing lousy NATs. Me? I just wanted to play around with my router.

Configuring a hidden service is actually straightforward so to make things more interesting, this article will cover configuring a hidden service on a Turris Omnia router with the help of Linux Containers to maximise isolation of components. While some of the steps will be Omnia-specific, most translate easily to other systems, so this post may be applicable regardless of the distribution used.

## External drive

Turris Omnia uses an eMMC as its root device which, on the count of it being soldered onto the board, is hard to replace. To mitigate the risk of it wearing off, data can be saved onto an external storage instead. The router comes with mSATA slot and USB ports. Either can be used to attach a drive. In addition, mPCIE SATA controller is included in NAS version of the router and can be added to regular versions.

No matter how additional storage is attached, it can be mounted under /srv directory which is where many applications will store their data. This is a completely adequate solution but there’s also a more… exciting alternative: making the router boot from the external drive. The upside is that the eMMC will never wear off. The downside is that it won’t work with a storage device attached through a SATA controller.

First, access to router’s serial console needs to be established. Any USB to UART connector—such as TTL-232R-RPI or something PL2303-based—should work. If pinout of the converter isn’t documented, it’s usually enough to open it and look at the labels prinetd on its PCB. On Omnia’s side, UART header is located between LEDs and brightness button. Starting from the side close to the LEDs, the pins are: ground, RX, TX and usually unused +3.3V.

Once connection to the serial console is established, configuring the router to boot from an external drive is as easy as grabbing Omnia’s medkit and following official instructions. One thing to consider is that /dev/sda1 name is unstable so rather than root=/dev/sda1 kernel argument advocated by the official method it’s better to use partition UUID. For MBR partition tables, it is composed from disk identifier and partition number which can both be obtained from the output of fdisk -l.

# fdisk -l /dev/sda
Disk /dev/sda: 55.9 GiB, 60022480896 bytes, 117231408 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xe17eb372

Device     Boot Start       End   Sectors  Size Id Type
/dev/sda1        2048 117231407 117229360 55.9G 83 Linux


For example, if the disk identifier is e17eb372 (as shown in listing above), root=​PARTUUID=e17eb372-01 should be used instead of root=/dev/sda1 when setting bootloader’s bootargs variable. If time ever comes to replace the disk, identifier of a new drive can be set to match the original using fdisk’s expert mode.

## Linux Containers

To improve isolation between components, it’s a good idea to take advantage of Linux Containers (LXC). On Omnia, LXC utilities are installed in the Updater section of Foris interface. Installation procedure will differ on other systems and their documentation should be consulted.

## Web server

Local network 192.168.1.0/24 192.168.1.1 192.168.1.80

This article will use a web server as an example of hidden service’s back-end, but one should remember that anything accepting TCP connections can be used: SSH server, IRC bouncer and SOCKS proxy are all valid candidates.

First a new container is needed. Because alpine is a lightweight, security-oriented Linux distribution it is a good choice for container’s base image. To make things easier, it will be configured with static IP of 192.168.1.80 with 192.168.1.1 default gateway and DNS server.

[turris]# lxc-create -n www -P /srv/lxc -t download -- \
-d Alpine -r Edge -a "$(uname -m)" [turris]# cat >/srv/lxc/www/rootfs/etc/network/interfaces \ <<EOF auto eth0 iface eth0 inet static address 192.168.1.80 netmask 255.255.255.0 gateway 192.168.1.1 hostname \$(hostname)
EOF
[turris]# echo nameserver 192.168.1.1 \
>/srv/lxc/www/rootfs/etc/resolv.conf
[turris]# lxc-start -n www
[turris]# lxc-attach -n www
[www]# rc-service lighttpd start
[www]# exit


At this point, 192.168.1.80 should lead to ‘It works’ web page. Since the HTTP server does not need to make any outgoing connections, the image can (should) have network traffic restricted with the following rules:

[turris]# opkg install iptables-mod-extra
[turris]# lxc-attach -n www
# Allow incoming connections from LAN only
[www]# iptables -A INPUT -s 192.168.1.0/16 -j ACCEPT
[www]# iptables -A INPUT -m state --state NEW -j DROP
# Don’t allow any forwarding
[www]# iptables -P FORWARD DROP
# Let root resolve domain names
[www]# iptables -A OUTPUT -d 192.168.1.1 -p udp --dport 53 \
-m owner --uid-owner 0 -j ACCEPT
[www]# iptables -A OUTPUT -d 192.168.1.1 -p tcp --dport 53 \
-m owner --uid-owner 0 -j ACCEPT
# Don’t allow initiating connections to LAN
[www]# iptables -A OUTPUT -d 192.168.1.0/16 -m state --state NEW -j DROP
[www]# iptables -A OUTPUT -d 192.168.1.0/16 -j ACCEPT
# Allow only root to talk to the Internet
[www]# iptables -A OUTPUT -m owner --uid-owner 0 -j ACCEPT
[www]# iptables -P OUTPUT DROP
# Store and apply the rules on each boot
[www]# /etc/init.d/iptables save
[www]# exit


## Tor

The next step is setting up Tor. Like before, it’s going to run inside of a container to maximise isolation. However, this time the image will be using dynamic IP, since it’s address won’t be hard-coded anywhere.

[turris]# lxc-create -n tor -P /srv/lxc -t download -- \
-d Alpine -r Edge -a "(uname -m)" [turris]# lxc-start -n tor [turris]# lxc-attach -n tor [tor]# apk add tor logrotate [tor]# cat >/etc/tor/torrc <<EOF User tor SOCKSPort 0 SOCKSPolicy reject * ExitRelay 0 ExitPolicy reject *:* Log notice file /var/log/tor/notices.log DataDirectory /var/lib/tor HiddenServiceDir /var/lib/tor/hidden HiddenServiceVersion 3 HiddenServicePort 80 192.168.1.80:80 EOF [tor]# rc-update add tor [tor]# rc-service tor start [tor]# exit  Once Tor is started, it’ll automatically generate keys for the hidden service and save its .onion address in /var/lib/tor/hidden-site/hostname file (or /srv/lxc/tor/rootfs/var/lib/tor/hidden-site/hostname if acessing it from outside of the container). At this point everything should work. The web server should be accessible via its local address as well as through its .onion address. To finish up with the container, firewall rules need to be created. Tor needs to be able to talk to the Internet and to the web server but does not need to connect to any other hosts on local network nor accept any incoming connections. This can be codified with the following instructions: [turris]# opkg install iptables-mod-extra [turris]# lxc-attach -n tor [tor]# apk add iptables # Disallow incoming connections [tor]# iptables -A INPUT -m state --state NEW -j DROP # Don’t allow any forwarding [tor]# iptables -P FORWARD DROP # Allow talking to local DNS [tor]# iptables -A OUTPUT -d 192.168.1.1 -p udp --dport 53 -j ACCEPT [tor]# iptables -A OUTPUT -d 192.168.1.1 -p tcp --dport 53 -j ACCEPT # and the web server that’s being hidden # iptables -A OUTPUT -d 192.168.1.80 -p tcp --dport 80 -j ACCEPT # but no other host in LAN. # iptables -A OUTPUT -d 192.168.1.0/16 -j DROP # Store and apply the rules on each boot [tor]# rc-update add iptables [tor]# /etc/init.d/iptables save [tor]# exit  ## Making containers start at boot The final thing to do is make both containers start when the router boots. Otherwise, the hidden service will stop working as soon as host reboots. In Omnia this is done by editing /etc/config/lxc-auto file to contain the following: config container option name www config container option name tor  ## Security considerations While setting up a hidden service is trivial, making it secure is another matter. It’s not inconceivable that some servers may be tricked to leak information about their external IP. Perhaps an FTP server is made to make an active data connection over the Internet. Maybe an HTTP server displays its external address in error pages. Not to mention arbitrary command execution exploits which could be used to make simple requests over the Internet. Restricting service’s back-end access to the Internet (as has been done in this article) or configuring it to only ever use Tor circuits is one defence. It’s also important to keep in mind that Tor relays report all their bandwidth publicly. In other words, if a process providing a hidden service is also running as a relay, it is theoretically possible to locate the service by issuing requests to it and observing transfer reported by the relay. As such, a Tor relay shouldn’t be run on an instance which is also providing a hidden service. Security of the service is beyond the scope of this article (especially as in my case privacy is not of paramount importance) so reader is encouraged to do their own due diligence. ### Thursday, 14 February 2019 ## The world’s most advanced UNICs of Organizers I recently began using Emacs Org mode, a tool for keeping notes, maintaining TODO lists, planning projects, and authoring documents with a fast and effective plain-text system. Since I am a cosplayer I was looking for a repacement for Cosplanner, a non-free Android app. When I was still using Android, I once installed Cosplanner and found out that it has many nasty features. So I deleted my copy. Unlike Cosplanner, Orgmode uses a human readable text format that you can read with any text editor. This allows the user to store an Orgmode file in a git repository that can be synced between devices. Emacs was written by Richard Stallman as part of the GNU Operating System. GNU is a Unix-compatible system that respects the users freedom. Todays GNU comes mainly in form of a GNU/Linux distribution, but the Hurd (GNU’s Kernel) still exists. The Guix System Distribution is one of those, it is often called the Emacs of Distros. There are many text editors, but Emacs is probably the worlds most advanced one. ## I Love Free Software Day 2019 Free Software is a substantial part of my life. I got introduced to it by my computer science teacher in middle school, however back then I wasn’t paying that much attention to the ethics behind it and rather focused on the fact that it was gratis and new to me. Using GNU/Linux on a school computer wasn’t really fun for me, as the user interface was not really my taste (I’m sorry KDE). It was only when I got so annoyed from the fact that my copy of Windows XP was 32 bit only and that I was supposed to pay the full price again for a 64 bit license, that I deleted Windows completely and installed Ubuntu on my computer – only to reinstall Windows again a few weeks later though. But the first contact was made. Back then I was still mostly focused on cool features rather than on the meaning of free software. Someday however, I watched the talk by Richard Stallman and started to read more about what software freedom really is. At this point I was learning how to use blender on Ubuntu to create animations and only rarely booted into Windows. But when I did, it suddenly felt oddly wrong. I realized that I couldn’t truly trust my computer. This time I tried harder to get rid of Windows. Someone once said that you only feel your shackles when you try to move. I think the same goes for free software. Once you realize what free software is and what rights it grants you (what rights you really have), you start to feel uncomfortable if you’re suddenly denied those rights. And that’s why I love free software! It gives you back the control over your machine. It’s something that you can trust, as there are no secrets kept from you (except if the program is written in Haskell and uses monads :P). My favorite free software projects for this years I love free software day are the document digitization and management tool paperwork, the alternative Mastodon/Pleroma interface Halcyon and the WordPress ActivityPub Plugin. These are projects that I discovered in 2018/2019 and that truly amazed me. I already wrote two blog posts about paperwork and the fediverse / the ActivityPub plugin earlier, so I’ll focus mainly on Halcyon today. Feel free to give those other posts a read though! I’m a really big fan of the fediverse and Mastodon in particular, but I dislike Mastodon’s current interface (two complaints about user interfaces in one post? Mimimi…). In my opinion Mastodons column interface doesn’t really give enough space to the content and is not very intuitive. Halcyon is a web client which acts as an alternative interface to your Mastodon/Pleroma account. Visually it closely resembles the Twitter UI which I quite like. As a plus, it is way easier to get people to move from Twitter to the fediverse by providing them with a familiar interface There are some public instances of Halcyon available, which you can use to try out Halcyon for yourselves, however in the long run I recommend you to self-host it, as you have to enter your account details in order to use it. Hosting it doesn’t take much more than a simple Raspberry Pi as it’s really light weight. I know that a huge number of free software projects is developed by volunteers in their free time. Most of them don’t get any monetary compensation for their work and people often take this for granted. Additionally, a lot of the feedback developers get from their users is when things don’t work out or break. (Not only) today is a chance to give some positive feedback and a huge Thank You to the developers of the software that makes your life easier! Happy Hacking! ### Wednesday, 13 February 2019 ## Some Attention to Detail I spent some time recently looking at my Python-like language, Lichen, and its toolchain. Although my focus was on improving support for floating point numbers and arithmetic, of which more may need to be written in a future article, I ended up noticing a few things that needed correcting and had escaped my attention. One of these probably goes a long way to solving a mystery raised in a previous article. The investigation into floating point support necessitated some scrutiny of the way floating point numbers are allocated when compiled Lichen programs are run. CPython – the C language implementation of a virtual machine for the Python language – has various strategies for reserving memory for floating point numbers, this not being particularly surprising given what it does for integers, as we previously saw. What bothered me was how much time was being spent allocating space for numbers needed to store computation results. I spent quite a bit of time looking at the run-time support code for compiled programs, trying different strategies to “preallocate” number instances and other things, but it was when I was considering various other optimisation strategies and running generated programs in the GNU debugger (gdb) that I happened to notice something about the type definitions that are generated for instances. For example, here is what a tuple instance type looks like: typedef struct { const __table * table; __pos pos; __attr attrs[__csize___builtins___tuple_tuple]; } __obj___builtins___tuple_tuple; And here is what it should look like: typedef struct { const __table * table; __pos pos; __attr attrs[__isize___builtins___tuple_tuple]; } __obj___builtins___tuple_tuple; Naturally, I will excuse you for not necessarily noticing the crucial difference, but it is the size of the attrs array, this defining the attributes that are available via each instance of the tuple type. And here, I had used a constant prefixed with “__csize” meaning class size, as opposed to “__isize” meaning instance size. With so many things to think about when finishing off my toolchain, I had accidentally presented the wrong kind of value to the code generating these type definitions. So, what was going to happen was that instances were going to be given the wrong number of attributes: a potentially catastrophic fault! But it is in the case of types like the tuple where things get more interesting than that. Such types tend to have lots of methods associated with them, and these methods are, of course, stored as class attributes. Meanwhile, tuple instances are likely to have far fewer attributes, and even when the tuple data is considered, since tuples frequently have few elements, such instances are also likely to be far smaller than the size of the tuple class’s structure. Indeed, the following definitions are more or less indicative of the sizes of the tuple class and of tuple instances: __csize___builtins___tuple_tuple = 36 __isize___builtins___tuple_tuple = 2 And I had noticed this because, for some reason unknown to me at the time but obviously known to me now, floating point numbers were being allocated using far more space than I thought appropriate. Here are some definitions of interest: __csize___builtins___float_float = 43 __isize___builtins___float_float = 1 Evidently, something was very wrong until I noticed my simple mistake: that in the code generating the definitions for program types, I had accidentally used the wrong constant for instance attribute arrays. Fixing this meant that the memory allocator probably only needed to find 16 bytes or so, as opposed to maybe 186 bytes, for each number! Returning to tuples, though, it becomes interesting to see what effect this fix has on the performance of the benchmark previously discussed. We had previously seen that a program using tuples was inexplicably far slower than one employing objects to represent the same data. But with this unnecessary allocation occurring, it seems possible that this might have been making some extra work for the allocator and garbage collector. Here is a table of measurements from running the benchmark before and after the fix had been applied: Program Version Time Maximum Memory Usage Tuples 24s 122M Objects 15s 54M Tuples (fixed) 17s 30M Objects (fixed) 13s 30M Although there is still a benefit to using objects to model data in Lichen as opposed to keeping such data in tuples, the benefit is not as pronounced as before, with the memory usage now clearly comparable as we would expect. With this fix applied, both versions of the benchmark are even faster than they were before, but it is especially gratifying that the object-based version is now ten times faster when compiled with the Lichen toolchain than the same program run by the CPython virtual machine. ### Tuesday, 12 February 2019 ## Filesystem Abstractions for L4Re In my previous posts, I discussed the possibility of using “real world” filesystems in L4Re, initially considering the nature of code to access an ext2-based filesystem using theÂ library known as libext2fs, then getting some of that code working within L4Re itself. Having previously investigated the nature of abstractions for providing filesystems and file objects to applications, it was inevitable that I would now want to incorporate libext2fs into those abstractions and to try and access files residing in an ext2 filesystem using those abstractions. It should be remembered that L4Re already provides a framework for filesystem access, known as Vfs or “virtual file system”. This appears to be the way the standard file access functions are supported, with the “rom” part of the filesystem hierarchy being supported by a “namespace filesystem” library that understands the way that the deployed payload modules are made available as files. To support other kinds of filesystem, other libraries must apparently be registered and made available to programs. Although I am sure that the developers of the existing Vfs framework are highly competent, I found the mechanisms difficult to follow and quite unlike what I expected, which was to see a clear separation between programs accessing files and other programs providing those files. Indeed, this is what one sees when looking at other systems such as Minix 3 and its virtual filesystem. I must also admit that I became tired of having to dig into the code to understand the abstractions in order to supplement the reference documentation for the Vfs framework in L4Re. ## An Alternative Framework It might be too soon to label what I have done as a framework, but at the very least I needed to decide upon a few mechanisms to implement an alternative approach to providing file-like abstractions to programs within L4Re. There were a few essential characteristics to be incorporated: • A way of requesting access to a named file • The provision of objects maintaining the state of access to an opened file • The transmission of file content to file readers and from file writers • A way of cleaning up when programs are no longer accessing files One characteristic that I did want to uphold in any solution was to make programs largely oblivious to the nature of the accessed filesystems. They would navigate a virtual filesystem hierarchy, just as one does in Unix-like systems, with certain directories acting as gateways to devices exposing potentially different filesystems with superficially similar semantics. ## Requesting File Access In a system like L4Re, with notions of clients and servers already prevalent, it seems natural to support a mechanism for requesting access to files that sees a client – a program wanting to access a file – delegating the task of locating that file to a server. How the server performs this task may be as simple or as complicated as we wish, depending on what kind of architecture we choose to support. In an operating system with a “monolithic” kernel, like GNU/Linux, we also see such delegation occurring, with the kernel being the entity having to support the necessary wiring up of filesystems contributing to the virtual filesystem. So, it makes sense to support an “open” system call just like in other operating systems. The difference here, however, is that since L4Re is a microkernel-based environment, both the caller and the target of the call are mere programs, with the kernel only getting involved to route the call or message between the programs concerned. We would first need to make sure that the program accessing files has a reference (known as a “capability”) to another program that provides a filesystem and can respond to this “open” message. This wiring up of programs is a task for the system’s configuration file, as featured in some of my previous articles. We may now consider what the filesystem-providing program or filesystem “server” needs to do when receiving an “open” message. Let us consider the failure to find the requested file: the filesystem server would, in such an event, probably just return a response indicating failure without any real need to do anything else. It is in the case of a successful lookup that the response to the caller or client needs some more consideration: the server could indicate success, but what is the client going to do with such information? And how should the server then facilitate further access to the file itself? ## Providing Objects for File Access It becomes gradually clearer that the filesystem server will need to allocate some resources for the client to conduct its activities, to hold information read from the filesystem itself and to hold data sent for writing back to the opened file. The server could manage this within a single abstraction and support a range of different operations, accommodating not only requests to open files but also operations on the opened files themselves. However, this might make the abstraction complicated and also raise issues around things like concurrency. What if this server object ends up being so busy that while waiting for reading or writing operations to complete on a file for one program, it leaves other programs queuing up to ask about or gain access to other files? It all starts to sound like another kind of abstraction would be beneficial for access to specific files for specific clients. Consequently, we end up with an arrangement like this: Accessing a filesystem and a resource When a filesystem server receives an “open” message and locates a file, it allocates a separate object to act as a contact point for subsequent access to that file. This leaves the filesystem object free to service other requests, with these separate resource objects dealing with the needs of the programs wanting to read from and write to each individual file. The underlying mechanisms by which a separate resource object is created and exposed are as follows: 1. The instantiation of an object in the filesystem server program holding the details of the accessed file. 2. The creation of a new thread of execution in which the object will run. This permits it to handle incoming messages concurrently with the filesystem object. 3. The creation of an “IPC gate” for the thread. This effectively exposes the object to the wider environment as what often appears to be known as a “kernel object” (rather confusingly, but it simply means that the kernel is aware of it and has to do some housekeeping for it). Once activated, the thread created for the resource is dedicated to listening for incoming messages and handling them, invoking methods on the resource object as a proxy for the client sending those messages to achieve the same effect. ## Transmitting File Content Although we have looked at how files manifest themselves and may be referenced, the matter of obtaining their contents has not been examined too closely so far. A program might be able to obtain a reference to a resource object and to send it messages and receive responses, but this is not likely to be sufficient for transferring content to and from the file. The reason for this is that the messages sent between programs – or processes, since this is how we usually call programs that are running – are deliberately limited in size. Thus, another way of exchanging this data is needed. In a situation where we are reading from a file, what we would most likely want to see is a read operation populate some memory for us in our process. Indeed, in a system like GNU/Linux, I imagine that the Linux kernel shuttles the file data from the filesystem module responsible to an area of memory that it has reserved and exposed to the process. In a microkernel-based system, things have to be done more “collaboratively”. The answer, it would seem to me, is to have dedicated memory that isÂ shared between processes. Fortunately, and arguably unsurprisingly, L4Re provides an abstraction known as a “dataspace” that provides the foundation for such sharing. My approach, then, involves requesting a dataspace to act as a conduit for data, making the dataspace available to the file-accessing client and the file-providing server object, and then having a protocol to notify each other about data being sent in each direction. I considered whether it would be most appropriate for the client to request the memory or whether the server should do so, eventually concluding that the client could decide how much space it would want as a buffer, handing this over to the server to use to whatever extent it could. A benefit of doing things this way is that the client may communicate initialisation details when it contacts the server, and so it becomes possible to transfer a filesystem path – the location of a file from the root of the filesystem hierarchy – without it being limited to the size of an interprocess message. Opening a file using a path written to shared memory So, the “open” message references the newly-created dataspace, and the filesystem server reads the path written to the dataspace’s memory so that it may use it to locate the requested file. The dataspace is not retained by the filesystem object but is instead passed to the resource object which will then share the memory with the application or client. As described above, a reference to the resource object is returned in the response to the “open” message. It is worthwhile to consider the act of reading from a file exposed in this way. Although both client (the application in the above diagram) and server (resource object) should be able to access the shared “buffer”, it would not be a good idea to let them do so freely. Instead, their roles should be defined by the protocol employed for communication with one another. For a simple synchronous approach it would look like this: Reading data from a resource via a shared buffer Here, upon the application or client invoking the “read” operation (in other words, sending the “read” message) on the resource object, the resource is able to take control of the buffer, obtaining data from the file and writing it to the buffer memory. When it is done, its reply or response needs to indicate the updated state of the buffer so that the client will know how much data there is available, potentially amongst other things of interest. ## Cleaning Up Many of us will be familiar with the workflow of opening, reading and writing, and closing files. This final activity is essential not only for our own programs but also for the system, so that it does not tie up resources for activities that are no longer taking place. In the filesystem server, for the resource object, a “close” operation can be provided that causes the allocated memory to be freed and the resource object to be discarded. However, merely providing a “close” operation does not guarantee that a program would use it, and we would not want a situation where a program exits or crashes without having invoked this operation, leaving the server holding resources that it cannot safely discard. We therefore need a way of cleaning up after a program regardless of whether it sees the need to do so itself. In my earliest experiments with L4Re on the MIPS Creator CI20, I had previously encountered the use of interrupt request (IRQ) objects, in that case signalling hardware-initiated events such as the pressing of physical switches. But I also knew that the IRQ abstraction is employed more widely in L4Re to allow programs to participate in activities that would normally be the responsibility of the kernel in a monolithic architecture. It made me wonder whether there might be interrupts communicating the termination of a process that could then be used to clean up afterwards. One area of interest is that concerning the “IPC gate” mentioned above. This provides the channel through which messages are delivered to a particular running program, and up to this point, we have considered how a resource object has its own IPC gate for the delivery of messages intended for it. But it also turns out that we can also enable notifications with regard to the status of the IPC gate via the same mechanism. By creating an IRQ object and associating it with a thread as the “deletion IRQ”, when the kernel decides that the IPC gate is no longer needed, this IRQ will be delivered. And the kernel will make this decision when nothing in the system needs to use the IPC gate any more. Since the IPC gate was only created to service messages from a single client, it is precisely when that client terminates that the kernel will realise that the IPC gate has no other users. Resource deletion upon the termination of a client To enable this to actually work, however, a little trick is required: the server must indicate that it is ready to dispose of the IPC gate whenever necessary, doing so by decreasing the “reference count” which tracks how many things in the system are using the IPC gate. So this is what happens: 1. The IPC gate is created for the resource thread, and its details are passed to the client, exposing the resource object. 2. An IRQ object is bound to the thread and associated with the IPC gate deletion event. 3. The server decreases its reference count, relinquishing the IPC gate and allowing its eventual destruction. 4. The client and server communicate as desired. 5. Upon the client terminating, the kernel disassociates the client from the IPC gate, decreasing the reference count. 6. The kernel notices that the reference count is zero and sends an IRQ telling the server about the impending IPC gate deletion. 7. The resource thread in the server deallocates the resource object and terminates. 8. The IPC gate is deleted. Using the “gate label”, the thread handling communications for the resource object is able to distinguish between the interrupt condition and normal messages from the client. Consequently, it is able to invoke the appropriate cleaning up routine, to discard the resource object, and to terminate the thread. Hopefully, with this approach, resource objects will no longer be hanging around long after their clients have disappeared. ## Other Approaches Another approach to providing file content did also occur to me, and I wondered whether this might have been a component of the “namespace filesystem” in L4Re. One technique for accessing files involves mapping the entire file into memory using a “mmap” function. This could be supported by requesting a dataspace of a suitable size, but only choosing to populate a region of it initially. The file-accessing program would attempt to access the memory associated with the file, and upon straying outside the populated region, some kind of “fault” would occur. A filesystem server would have the job of handling this fault, fetching more data, allocating more memory pages, mapping them into the file’s memory area, and disposing of unwanted pages, potentially writing modified pages to the appropriate parts of the file. In effect, the filesystem server would act as a pager, as far as I can tell, and I believe it to be the case that Moe – the root task – acts in such a way to provide the “rom” files from the deployed payload modules. Currently, I don’t find it particularly obvious from the documentation how I might implement a pager, and I imagine that if I choose to support such things, I will end up having to trawl the code for hints on how it might be done. ## Client Library Functions To present a relatively convenient interface to programs wanting to use files, some client library functions need to be provided. The intention with these is to support the traditional C library paradigms and for these functions to behave like those that C programmers are generally familiar with. This means performing interprocess communications using the “open”, “read”, “write”, “close” and other messages when necessary, hiding the act of sending such messages from the library user. The details of such a client library are probably best left to another article. With some kind of mechanism in place for accessing files, it becomes a matter of experimentation to see what the demands of the different operations are, and how they may be tuned to reduce the need for interactions with server objects, hopefully allowing file-accessing programs to operate efficiently. The next article on this topic is likely to consider the integration of libext2fs with this effort, along with the library functionality required to exercise and test it. And it will hopefully be able to report some real experiences of accessing ext2-resident files in relatively understandable programs. ### Monday, 11 February 2019 ## FSFE Planet has been refurbished If you are reading these lines, you are already accessing the brand-new planet of the FSFE. While Bjรถrn, Coordinator of Team Germany, has largely improved the design in late 2017, we tackled many underlying issues this time. So what has changed under the hood? • The whole system runs in a Docker container now, with all code accessible on our Git. Yes, Docker has drawbacks, but in this case it eases maintenance for our volunteers and makes contributions to design and code very simple. • The old planet ran on a very old Debian server which had issues with modern TLS versions. This basically erased a few blogs from the planet whose webservers do not support older encryption standards. • The design has been improved once more. It now more closely aligns to the design of our main page fsfe.org and feels more natively to use and browse. • Many blogs which were not accessible any more have been removed, and those which redirected to other URLs have been updated accordingly. So with the migration to the new system you will probably find a few new blogs and unread posts in your RSS feeds now. So please do not be confused about it but look forward to even more useful and interesting bits from the FSFE community! On this occasion I would like to thank Michael and Vincent for their contributions to the code, and the useful feedback from various people in the FSFE’s community. If you have ideas how to further improve our planet, please open an issue in the Git repository or write an email to us. ### Saturday, 09 February 2019 ## Java: String↔char[] Do you recall when I decided to abuse Go’s run-time and play with string[]byte conversion? Fun times… I wonder if we could do the same to Java? To remind ourselves of the ‘problem’, strings in Java are immutable but because Java has no concept of ownership or const keyword (can we move the industry to Rust already?) to make true on that promise, Java run-time has to make a defensive copy each time a new string is created or when string’s characters are returned. Alas, do not despair! There is another way (exception handling elided for brevity): private static Field getValueField() { final Field field = String.class.getDeclaredField("value"); field.setAccessible(true); /* Test that it works. */ final char[] chars = new char[]{'F', 'o', 'o'}; final String string = new String(); field.set(string, chars); if (string.equals("Foo") && field.get(string) == chars) { return field; } throw new UnsupportedOperationException( "UnsafeString not supported by the run-time"); } private final static Field valueField = getValueField(); public static String fromChars(final char[] chars) { final String string = new String(); valueField.set(string, chars); return string; } public static char[] toChars(final String string) { return (char[]) valueField.get(string); }  However. There is a twist… ## Benchmarks Benchmarks shouldn’t surprise anyone: Argument length UnsafeString​::​fromChars [ns] String​::​new [ns] 013.92 4.97 0.37✕ 113.98 8.07 0.58✕ 314.07 8.07 0.57✕ 414.06 8.09 0.58✕ 1014.15 9.25 0.65✕ 3314.23 12.54 0.88✕ 10014.12 29.68 2.10✕ 1000014.17 2937.98 207.33 100000014.04319440.0022754.73 Argument length UnsafeString​::​toChars [ns] String​::​toCharArray [ns] 0 5.79 4.64 0.80✕ 1 5.13 9.17 1.79✕ 3 5.57 9.08 1.63✕ 4 5.13 9.13 1.78✕ 10 5.67 10.47 1.85✕ 33 5.49 13.03 2.37✕ 100 5.11 29.38 5.75✕ 10000 5.12 2950.88 575.79 1000000 5.15318074.0061728.38 The unsafe variant takes roughly the same amount of time regardless of the size of the argument while safe variant scales linearly. Interestingly, because reflection is slow, safe call is faster for short strings. The code including tests and benchmarks can be found in java-unsafe-string repository. If the benchmarks aren’t surprising, what’s up with the twist then? ## Java 6 While we’re on the subject of messing with the Java’s String object it might be good to mention the above code won’t work in Java 6 and earlier versions. Until Java 7u6, String::substring created objects which shared character array with the ‘parent’ string. This had some advantages – the operation was constant time and constant memory operation – but could lead to memory leaks (if the base string got garbage collected its entire contents would remain in memory even if the substring needed just a small portion of it) and complicated the String class (by requiring offset and length fields). In the end, the implementation has been changed and strings now own the entire character array. Interestingly, the ‘trigger’ for the change was introduction of (now removed) new hashing algorithm for strings. Whatever the case, code presented here won’t work before Java 7u6. But wait, this is still not the twist I’ve promised. ;) ## Java 9 The above benchmarks were run on Java 8 and by now probably everyone and their dog knows that this particular version’s support has ended. Let’s jump to the next LTS version, Java 11:  javac com/mina86/unsafe/*.java &&
echo && java -version && echo &&
java com.mina86.unsafe.UnsafeStringBenchmark

openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment (build 11.0.2+9-Debian-3)
OpenJDK 64-Bit Server VM (build 11.0.2+9-Debian-3, mixed mode, sharing)

Testing safe implementation: ........................... done, all ok
+   safe::fromChars/0      : 1194602409 ops in 1731473175 ns: 1.44941 ns/op
+   safe::fromChars/1      :  204060009 ops in 1622419993 ns: 7.95070 ns/op
+   safe::fromChars/3      :  312337323 ops in 2857745803 ns: 9.14955 ns/op
+   safe::fromChars/4      :  124336092 ops in 2170864835 ns: 17.4597 ns/op
+   safe::fromChars/10     :  306122448 ops in 2816903678 ns: 9.20189 ns/op
+   safe::fromChars/33     :  172483182 ops in 1914933095 ns: 11.1021 ns/op
+   safe::fromChars/100    :  103099869 ops in 2107079434 ns: 20.4373 ns/op
+   safe::fromChars/10000  :     661688 ops in 1031572901 ns: 1559.00 ns/op
+   safe::fromChars/1000000:       4397 ops in 1002248806 ns: 227939 ns/op
+     safe::toChars/0      :  280731006 ops in 2171809870 ns: 7.73627 ns/op
+     safe::toChars/1      :  273448179 ops in 2172255240 ns: 7.94394 ns/op
+     safe::toChars/3      :  284117814 ops in 2760800696 ns: 9.71710 ns/op
+     safe::toChars/4      :  240143619 ops in 2666941237 ns: 11.1056 ns/op
+     safe::toChars/10     :  234594930 ops in 2264769324 ns: 9.65396 ns/op
+     safe::toChars/33     :  205747203 ops in 2952933911 ns: 14.3522 ns/op
+     safe::toChars/100    :   94298106 ops in 2873368834 ns: 30.4711 ns/op
+     safe::toChars/10000  :     357551 ops in 1046061057 ns: 2925.63 ns/op
+     safe::toChars/1000000:       9012 ops in 2813949290 ns: 312245 ns/op


So far so good. The times are bit more noisy but creation of an empty string seemed to be optimised. Let’s see how unsafe version compares.

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.mina86.unsafe.UnsafeStringImpl (file:/home/mpn/code/unsafe-str/) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of com.mina86.unsafe.UnsafeStringImpl
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
java.lang.IllegalArgumentException: Can not set final [B field java.lang.String.value to [C
at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:167)
at java.base/jdk.internal.reflect.UnsafeFieldAccessorImpl.throwSetIllegalArgumentException(UnsafeFieldAccessorImpl.java:171)
at java.base/jdk.internal.reflect.UnsafeQualifiedObjectFieldAccessorImpl.set(UnsafeQualifiedObjectFieldAccessorImpl.java:83)
at java.base/java.lang.reflect.Field.set(Field.java:780)
at com.mina86.unsafe.UnsafeStringImpl.makeUnsafeMaybe(UnsafeStringImpl.java:19)
at com.mina86.unsafe.UnsafeStringBenchmark.main(UnsafeStringBenchmark.java:15)
Testing unsafe implementation: unsupported by the run-time


How’s that for a twist? Uh? I overhyped the twist, you say? Well… Dumbledore dies!

On a serious note though, yes, starting with Java 9, Oracle started locking down internal APIs making some low-level optimisations no longer possible, so as you move from Java 8 remember to check any libraries which achieve high performance by messing around Java’s internals.

## Privacy-preserving monitoring of an anonymity network (FOSDEM 2019)

This is a transcript of a talk I gave at FOSDEM 2019 in the Monitoring and Observability devroom about the work of Tor Metrics.

Producing this transcript was more work than I had anticipated it would be, and I’ve done this in my free time, so if you find it useful then please do let me know otherwise I probably won’t be doing this again.

I’ll start off by letting you know who I am. Generally this is a different audience for me but I’m hoping that there are some things that we can share here. I work for Tor Project. I work in a team that is currently formed of two people on monitoring the health of the Tor network and performing privacy-preserving measurement of it. Before Tor, I worked on active Internet measurement in an academic environment but I’ve been volunteering with Tor Project since 2015. If you want to contact me afterwards, my email address, or if you want to follow me on the Fediverse there is my WebFinger ID.

So what is Tor? I guess most people have heard of Tor but maybe they don’t know so much about it. Tor is quite a few things, it’s a community of people. We have a paid staff of approximately 47, the number keeps going up but 47 last time I looked. We also have hundereds of volunteer developers that contribute code and we also have relay operators that help to run the Tor network, academics and a lot of people involved organising the community locally. We are registered in the US as a non-profit organisation and the two main things that really come out of Tor Project are the open source software that runs the Tor network and the network itself which is open for anyone to use.

Currently there are an estimated average 2,000,000 users per day. This is estimated and I’ll get to why we don’t have exact numbers.

Most people when they are using Tor will use Tor Browser. This is a bundle of Firefox and a Tor client set up in such a way that it is easy for users to use safely.

When you are using Tor Browser your traffic is proxied through three relays. With a VPN there is only one server in the middle and that server can see either side. It is knows who you are and where you are going so they can spy on you just as your ISP could before. The first step in setting up a Tor connection is that the client needs to know where all of those relays are so it downloads a list of all the relays from a directory server. We’re going to call that directory server Dave. Our user Alice talks to Dave to get a list of the relays that are available.

In the second step, the Tor client forms a circuit through the relays and connects finally to the website that Alice would like to talk to, in this case Bob.

If Alice later decides they want to talk to Jane, they will form a different path through the relays.

We know a lot about these relays. Because the relays need to be public knowledge for people to be able to use them, we can count them quite well. Over time we can see how many relays there are that are announcing themselves and we also have the bridges which are a seperate topic but these are special purpose relays.

Because we have to connect to the relays we know their IP addresses and we know if they have IPv4 or IPv6 addresses so as we want to get better IPv6 support in the Tor network we can track this and see how the network is evolving.

Because we have the IP addresses we can combine those IP addresses with GeoIP databases and that can tell us what country those relays are in with some degree of accuracy. Recently we have written up a blog post about monitoring the diversity of the Tor network. The Tor network is not very useful if all of the relays are in the same datacenter.

We also perform active measurement of these relays so we really analyse these relays because this is where we put a lot of the trust in the Tor network. It is distributed between multiple relays but if all of the relays are malicious then the network is not very useful. We make sure that we’re monitoring its diversity and the relays come in different sizes so we want to know are the big relays spread out or are there just a lot of small relays inflating the absolute counts of relays in a location.

When we look at these two graphs, we can see that the number of relays in Russia is about 250 at the moment but when we look at the top 5 by the actual bandwidth they contribute to the network they drop off and Sweden takes Russia’s place in the top 5 contributing around 4% of the capacity.

The Tor Metrics team, as I mentioned we are two people, and we care about measuring and analysing things in the Tor network. There are 3 or 4 repetitive contributors and then occasionally people will come along with patches or perform a one-off analysis of our data.

We use this data for lots of different use cases. One of which is detecting censorship so if websites are blocked in a country, people may turn to Tor in order to access those websites. In other cases, Tor itself might be censored and then we see a drop in Tor users and then we also see we a rise in the use of the special purpose bridge relays that can be used to circumvent censorship. We can interpret the data in that way.

We can detect attacks against the network. If we suddenly see a huge rise in the number of relays then we can suspect that OK maybe there is something malicious going on here and we can deal with that. We can evaluate effects on how performance changes when we make changes to the software. We have recently made changes to an internal scheduler and the idea there is to reduce congestion at relays and from our metrics we can say we have a good idea that this is working.

Probably one of the more important aspects is being able to take this data and make the case for a more private and secure Internet, not just from a position of I think we should do this, I think it’s the right thing, but here is data and here is facts that cannot be so easily disputed.

We only handle public non-sensitive data. Each analysis goes through a rigorous review and discussion process before publication.

As you might imagine the goals of privacy and anonymity network doesn’t lend itself to easy data gathering and extensive monitoring of the network. The Research Safety Board if you are interested in doing research on Tor or attempting to collect data through Tor can offer advice on how to do that safely. Often this is used by academics that want to study Tor but also the Metrics Team has used it on occasion where we want to get second opinions on deploying new measurements.

What we try and do is follow three key principles: data minimalisation, source aggregation, and transparency.

The first one of these is quite simple and I think with GDPR probably is something people need to think about more even if you don’t have an anonymity network. Having large amounts of data that you don’t have an active use for is a liability and something to be avoided. Given a dataset and given an infinite amount of time that dataset is going to get leaked. The probability increases as you go along. We want to make sure that we are collecting as little detail as possible to answer the questions that we have.

When we collect data we want to aggregate it as soon as we can to make sure that sensitive data exists for as little time as possible. This means usually in the Tor relays themselves before they even report information back to Tor Metrics. They will be aggregating data and then we will aggregate the aggregates. This can also include adding noise, binning values. All of these things can help to protect the individual.

And then being as transparent as possible about our processes so that our users are not surprised when they find out we are doing something, relay operators are not surprised, and academics have a chance to say whoa that’s not good maybe you should think about this.

The example that I’m going to talk about is counting unique users. Users of the Tor network would not expect that we are storing their IP address or anything like this. They come to Tor because they want the anonymity properties. So the easy way, the traditional web analytics, keep a list of the IP addresses and count up the uniques and then you have an idea of the unique users. You could do this and combine with a GeoIP database and get unique users per country and these things. We can’t do this.

So we measure indirectly and in 2010 we produced a technical report on a number of different ways we could do this.

It comes back to Alice talking to Dave. Because every client needs to have a complete view of the entire Tor network, we know that each client will fetch the directory approximately 10 times a day. By measuring how many directory fetches there are we can get an idea of the number of concurrent users there are of the Tor network.

Relays don’t store IP addresses at all, they count the number of directory requests and then those directory requests are reported to a central location. We don’t know how long an average session is so we can’t say we had so many unique users but we can say concurrently we had so many users on average. We get to see trends but we don’t get the exact number.

So here is what our graph looks like. At the moment we have the average 2,000,000 concurrent Tor users. The first peak may have been an attempted attack, as with the second peak. Often things happen and we don’t have full context for them but we can see when things are going wrong and we can also see when things are back to normal afterwards.

This is in a class of problems called the count-distinct problem and these are our methods from 2010 but since then there has been other work in this space.

One example is HyperLogLog. I’m not going to explain this in detail but I’m going to give a high-level overview. Imagine you have a bitfield and you initialise all of these bits to zero. You take an IP address, you take a hash of the IP address, and you look for the position of the leftmost one. How many zeros were there at the start of that string? Say it was 3, you set the third bit in your bitfield. At the end you have a series of ones and zeros and you can get from this to an estimate of the total number that there are.

Every time you set a bit there is 50% chance that that bit would be set given the number of distinct items. (And then 50% chance of that 50% for the second bit, and so on…) There is a very complicated proof in the paper that I don’t have time to go through here but this is one example that actually turns out to be quite accurate for counting unique things.

This was designed for very large datasets where you don’t have enough RAM to keep everything in memory. We have a variant on this problem where even keep 2 IP addresses in memory would, for us, be a very large dataset. We can use this to avoid storing even small datasets.

Private Set-Union Cardinality is another example. In this one you can look at distributed databases and find unique counts within those. Unfortunately this currently requires far too much RAM to actually do the computation for us to use this but over time these methods are evolving and should become feasible, hopefully soon.

And then moving on from just count-distinct, the aggregation of counters. We have counters such as how much bandwidth has been used in the Tor network. We want to aggregate these but we don’t want to release the individual relay counts. We are looking at using a method called PrivCount that allows us to get the aggregate total bandwidth used while keeping the individual relay bandwidth counters secret.

And then there are similar schemes to this, RAPPOR and PROCHLO from Google and Prio that Mozilla have written a blog post about are similar technologies. All of the links here in the slides and are in the page on the FOSDEM schedule so don’t worry about writing these down.

Finally, I’m looking at putting together some guidelines for performing safe measurement on the Internet. This is targetted primarily at academics but also if people wanted to apply this to analytics platforms or monitoring of anything that has users and you want to respect those users’ privacy then there could be some techniques in here that are applicable.

Ok. So that’s all I have if there are any questions?

Q: I have a question about how many users have to be honest so that the network stays secure and private, or relays?

A: At the moment when we are collecting statistics we can see – so as I showed earlier the active measurement – we know how much bandwidth a relay can cope with and then we do some load balancing so we have an idea of what fraction of traffic should go to each relay and if one relay is expecting a certain level of traffic and it has wildly different statistics to another relay then we can say that relay is cheating. There isn’t really any incentive to do this and it’s something we can detect quite easily but we are also working on more robust metrics going forward to avoid this being a point where it could be attacked.

Q: A few days ago I heard that with Tor Metrics you are between 2 and 8 million users but you don’t really know in detail what the real numbers are? Can you talk about the variance and which number is more accurate?

A: The 8 million number comes from the PrivCount paper and they did a small study where they looked at unique IP address over a day where we look at concurrent users. There are two different measurements. What we can say is that we know for certain that there are between 2 million and 25 million unique users per day but we’re not sure where in there we fall and 8 million is a reasonable-ish number but also they measured IP addresses and some countries use a lot of NAT so it could be more than 8 million. It’s tricky but we see the trends.

Q: Your presentation actually implies that you are able to collect more private data than you are doing. It says that the only thing preventing you from collecting private user data is the team’s good will and good intentions. Have I got it wrong? Are there any possibilities for the Tor Project team to collect some private user data?

A: Tor Project does not run the relays. We write the code but there individual relay operators that run the relays and if we were to release code that suddenly collecting lots of private data people would realise and they wouldn’t run that code. There is a step between the development and it being deployed. I think that it’s possible other people could write that code and then run those relays but if they started to run enough relays that it looked suspicious then people would ask questions there too, so there is a distributed trust model with the relays.

Q: You talk about privacy preserving monitoring, but also a couple of years ago we learned that the NSA was able to monitor relays and learn which user was connecting to relays so is there also research to make it so Tor users cannot be targetted for using a relay and never being able to be monitored.

A: Yes, there is! There is a lot of research in this area and one of them is through obfuscation techniques where you can make your traffic look like something else. They are called pluggable transports and they can make your traffic look like all sorts of things.

## KDE Applications 19.04 Schedule finalized

It is available at the usual place https://community.kde.org/Schedules/Applications/19.04_Release_Schedule

Dependency freeze is March 14 and Feature Freeze a week after that, make sure you start finishing your stuff!

## Using ext2 Filesystems with L4Re

Previously, I described my initial investigations into libext2fs and the development of programs to access and populate ext2/3/4 filesystems. With a program written and now successfully using libext2fs in my normal GNU/Linux environment, the next step appeared to be the task of getting this library to work within the L4Re system. The following steps were envisaged:

1. Figuring out the code that would be needed, this hopefully being supportable within L4Re.
2. Introducing the software as a package within L4Re.
3. Discovering the configuration required to build the code for L4Re.
4. Actually generating a library file.
5. Testing the library with a program.

This process is not properly completed in that I do not yet have a good way of integrating with the L4Re configuration and using its details to configure the libext2fs code. I felt somewhat lazy with regard to reconciling the use of autotools with the rather different approach taken to build L4Re, which is somewhat reminiscent of things like Buildroot and OpenWrt in certain respects.

So, instead, I built the Debian package from source in my normal environment, grabbed the config.h file that was produced, and proceeded to use it with a vastly simplified Makefile arrangement, also in my normal environment, until I was comfortable with building the library. Indeed, this exercise of simplified building also let me consider which portions of the libext2fs distribution would really be needed for my purposes. I did not really fancy having to struggle to build files that would ultimately be superfluous.

Still, as I noted, this work isn’t finished. However, it is useful to document what I have done so far so that I can subsequently describe other, more definitive, work.

## Making a Package

With a library that seemed to work with the archiving program, written to populate filesystems for eventual deployment, I then set about formulating this simplified library distribution as a package within L4Re. This involves a few things:

• Structuring the files so that the build system may process them.
• Persuading the build system to install things in places for other packages to find.
• Formulating the appropriate definitions to build the source files (and thus producing the right compiler and linker invocations).
Here are some notes about the results.

### The Package Structure

Currently, I have the following arrangement inside the pkg/libext2fs directory:

include
include/libblkid
include/libe2p
include/libet
include/libext2fs
include/libsupport
include/libuuid
lib
lib/libblkid
lib/libe2p
lib/libet
lib/libext2fs
lib/libsupport
lib/libuuid

To follow L4Re conventions, public header files have been moved into the include hierarchy. This breaks assumptions in the code, with header files being referenced without a prefix (like “ext2fs”, “et”, “e2p”, and so on) in some places, but being referenced with such a prefix in others. The original build system for the code gets away with this by using the “ext2fs” and other prefixes as the directory names containing the code for the different libraries. It then indicates the parent “lib” directory of these directories as the place to start looking for headers.

But I thought it worthwhile to try and map out the header usage and distinguish between public and private headers. At the very least, it helps me to establish the relationships between the different components involved. And I may end up splitting the different components into their own packages, requiring some formalisation of their interactions.

Meanwhile, I defined a Control file to indicate what the package provides:

provides: libblkid libe2p libet libext2fs libsupport libuuid

This appears to be used in dependency resolution, causing the package to be built if another package requires one of the named entities in its own Control file.

In each include subdirectory (such as include/libext2fs) is a Makefile indicating a couple of things, the following being used for libext2fs:

PKGNAME = libext2fs
CONTRIB_HEADERS = 1

The effect of this is to install the headers into a include/contrib/libext2fs directory in the build output.

In the corresponding lib subdirectory (which is lib/libext2fs), the following seems to be needed:

CONTRIB_INCDIR = libext2fs

Hopefully, with this, other packages can depend on libext2fs and have the headers made available to it by an include statement like this:

#include <ext2fs/ext2fs.h>

(The ext2fs prefix is provided by a directory inside include/libext2fs.)

Otherwise, headers may end up being put in a special “l4″ hierarchy, and then code would need changing to look something like this:

#include <l4/ext2fs/ext2fs.h>

So, avoiding this and having the original naming seems to be the benefit of the “contrib” settings, as far as I can tell.

### Defining Build Files

The Makefile in each specific lib subdirectory employs the usual L4Re build system definitions:

TARGET          = libext2fs.a libext2fs.so
PC_FILENAME     = libext2fs

The latter of these is used to identify the build products so that the appropriate compiler and linker options can be retrieved by the build system when this library is required by another. Here, PC is short for “package config” but the notion of “package” is different from that otherwise used in this article: it just refers to the specific library being built in this case.

An important aspect related to “package config” involves the requirements or dependencies of this library. These are specified as follows for libext2fs:

REQUIRES_LIBS   = libet libe2p

We saw these things in the Control file. By indicating these other libraries, the compiler and linker options to find and use these other libraries will be brought in when something else requires libext2fs. This should help to prevent build failures caused by missing headers or libraries, and it should also permit more concise declarations of requirements by allowing those declarations to omit libet and libe2p in this case.

Meanwhile, the actual source files are listed using a SRC_C definition, and the PRIVATE_INCDIR definition lists the different paths to be used to search for header files within this package. Moving the header files around complicates this latter definition substantially.

There are other complications with libext2fs, notably the building of a tool that generates a file to be used when building the library itself. I will try and return to this matter at some point and figure out a way of doing this within the build system. Such generation of binaries for use in build processes can be problematic, particularly if there is some kind of assumption that the build system is the same as the target system, but such assumptions are probably not being made here.

## Building the Library

Fortunately, the build system mostly takes care of everything else, and a command like this should see the package being built and libraries produced:

make O=mybuild S=pkg/libext2fs

The “S” option is a real time saver, and I wish I had made more use of it before. Use of the “V” option can be helpful in debugging command options, since the normal output is abridged:

make O=mybuild S=pkg/libext2fs V=1

I will admit that since certain header files are not provided by L4Re, a degree of editing of the config.h file was required. Things like HAVE_LINUX_FD_H, indicating the availability of Linux-specific headers, needed to be removed.

## Testing the Library

An appropriate program for testing the library is really not much different from one used in a GNU/Linux environment. Indeed, I just took some code from my existing program that lists a directory inside a filesystem image. Since L4Re should provide enough of a POSIX-like environment to support such unambitious programs, practically no changes were needed and no special header files were included.

A suitable Makefile is needed, of course, but the examples package in L4Re provides plenty of guidance. The most important part is this, however:

REQUIRES_LIBS   = libext2fs

A Control file requiring libext2fs is actually not necessary for an example in the examples hierarchy, it would seem, but such a file would otherwise be advisible. The above library requirements pull in the necessary compiler and linker flags from the “package config” universe. (It also means that the libext2fs headers are augmented by the libe2p and libet headers, as defined in the required libraries for libext2fs itself.)

As always, deploying requires a suitable configuration description and a list of modules to be deployed. The former looks like this:

local L4 = require("L4");

l:startv({
log = { "ext2fstest", "g" },
},
"rom/ex_ext2fstest", "rom/ext2fstest.fs", "/");

The interesting part is right at the end: a program called ex_ext2fstest is run with two arguments: the name of a file containing a filesystem image, and the directory inside that image that we want the program to show us. Here, we will be using the built-in “rom” filesystem in L4Re to serve up the data that we will be decoding with libext2fs in the program. In effect, we use one filesystem to bootstrap access to another!

Since the “rom” filesystem is merely a way of exposing modules as files, the filesystem image therefore needs to be made available as a module in the module list provided in the conf/modules.list file, the appropriate section starting off like this:

entry ext2fstest
module ext2fstest.cfg
module ext2fstest.fs
module l4re
module ned
module ex_ext2fstest
# plus lots of library modules

All these experiments are being conducted with L4Re running on the UX configuration of Fiasco.OC, meaning that the system runs on top of GNU/Linux: a sort of “user mode L4″. Running the set of modules for the above test is a matter of running something like this:

make O=mybuild ux E=ext2fstest

This produces a lot of output and then some “logged” output for the test program:

ext2fste| Opened rom/ext2fstest.fs.
ext2fste| /
ext2fste| drwxr-xr-x-       0     0        1024 .
ext2fste| drwxr-xr-x-       0     0        1024 ..
ext2fste| drwx-------       0     0       12288 lost+found
ext2fste| -rw-r--r---    1000  1000       11449 e2access.c
ext2fste| -rw-r--r---    1000  1000        1768 file.c
ext2fste| -rw-r--r---    1000  1000        1221 format.c
ext2fste| -rw-r--r---    1000  1000        6504 image.c
ext2fste| -rw-r--r---    1000  1000        1510 path.c

It really isn’t much to look at, but this indicates that we have managed to access an ext2 filesystem within L4Re using a program that calls the libext2fs library functions. If nothing else, the possibility of porting a library to L4Re and using it has been demonstrated.

But we want to do more than that, of course. The next step is to provide access to an ext2 filesystem via a general interface that hides the specific nature of the filesystem, one that separates the work into a different program from those wanting to access files. To do so involves integrating this effort into my existing filesystem framework, then attempting to re-use a generic file-accessing program to obtain its data from ext2-resident files. Such activities will probably form the basis of the next article on this topic.

## Calculating sRGB↔XYZ matrix

I’ve recently found myself in need of an sRGB↔XYZ transformation matrix expressed to the maximum possible precision. Sources on the Internet typically limit the precision to just a few decimal places so I've decided to do the calculations by myself.

What we’re looking for is a 3-by-3 matrix $$M$$ which, when multiplied by red, green and blue coördinates of a colour, produces its XYZ coördinates. In other words, a change of basis matrix from a space whose basis vectors are sRGB’s primary colours: $$M = \begin{bmatrix} X_r & X_g & Y_b \\ Y_r & Y_g & Y_b \\ Z_r & Z_g & Z_b \end{bmatrix}$$

## Derivation

sRGB primary colours are defined in IEC 61966 standard (and also Rec. 709 document which is 170 francs cheaper) as a pair of x and y values (i.e. chromaticity coördinates). Converting them to XYZ space is simple: $$X = x Y / y$$ and $$Z = (1 - x - y) Y / y,$$ but leaves luminocity (the Y value) undefined.

$$\langle x, y\rangle$$ $$\langle X, Y, Z\rangle$$
Red $$\langle 0.64, 0.33\rangle$$ $$\langle 64 Y_r / 33, \; Y_r, \; Y_r / 11\rangle$$
Green $$\langle 0.30, 0.60\rangle$$ $$\langle Y_g / 2, \; Y_g, \; Y_g / 6\rangle$$
Blue $$\langle 0.15, 0.06\rangle$$ $$\langle 5 Y_b / 2, \; Y_b, \; 79 Y_b / 6\rangle$$
White (D65) $$\langle 0.31271, 0.32902\rangle$$ $$\langle 31271 Y_w / 32902, \; Y_w, \; 35827 Y_w / 32902\rangle$$

That’s where reference white point comes into play. Its coördinates in linear RGB space, $$\langle 1, 1, 1 \rangle,$$ can be plugged into the change of basis formula to yield the following equation: $$\begin{bmatrix} X_w \\ Y_w \\ Z_w \end{bmatrix} = \begin{bmatrix} X_r & X_g & X_b \\ Y_r & Y_g & Y_b \\ Z_r & Z_g & Z_b \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}$$

For each colour $$c$$ (including white), $$X_c$$ and $$Z_c$$ can be expressed as a product of a known quantity and $$Y_c$$ (see table above). Furthermore, by definition of a white point, $$Y_w = 1.$$ At this point luminocities of the primary colours are the only unknowns. To isolate them, let’s define $$X'_c = X_c / Y_c$$ and $$Z'_c = Z_c / Y_c$$ and see where that leads us: \begin{align} \begin{bmatrix} X_w \\ Y_w \\ Z_w \end{bmatrix} &= \begin{bmatrix} X_r & X_g & X_b \\ Y_r & Y_g & Y_b \\ Z_r & Z_g & Z_b \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} X'_r Y_r & X'_g Y_g & X'_b Y_b \\ Y_r & Y_g & Y_b \\ Z'_r Y_r & Z'_g Y_g & Z'_b Y_b \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} X'_r & X'_g & X'_b \\ 1 & 1 & 1 \\ Z'_r & Z'_g & Z'_b \end{bmatrix} \begin{bmatrix} Y_r & 0 & 0 \\ 0 & Y_g & 0 \\ 0 & 0 & Y_b \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} \\ &= \begin{bmatrix} X'_r & X'_g & X'_b \\ 1 & 1 & 1 \\ Z'_r & Z'_g & Z'_b \end{bmatrix} \begin{bmatrix} Y_r \\ Y_g \\ Y_b \end{bmatrix} \\ \\ \begin{bmatrix} Y_r \\ Y_g \\ Y_b \end{bmatrix} &= \begin{bmatrix} X'_r & X'_g & X'_b \\ 1 & 1 & 1 \\ Z'_r & Z'_g & Z'_b \end{bmatrix}^{-1} \begin{bmatrix} X_w \\ Y_w \\ Z_w \end{bmatrix} \end{align}

All quantities on the right-hand side are known therefore $$[Y_r Y_g Y_b]^T$$ can be computed. Let’s tidy things up into a final formula.

## Final formula

Given chromaticity of primary colours of an RGB space ($$\langle x_r, y_r \rangle,$$ $$\langle x_g, y_g \rangle$$ and $$\langle x_b, y_b \rangle$$) and its reference white point ($$\langle x_w, y_w \rangle$$), the matrix for converting linear RGB coördinates to XYZ is: $$M = \begin{bmatrix} X'_r Y_r & X'_g Y_g & X'_b Y_b \\ Y_r & Y_g & Y_b \\ Z'_r Y_r & Z'_g Y_g & Z'_b Y_b \end{bmatrix}$$

which can also be written as $$M = M' \times \mathrm{diag}(Y_r, Y_g, Y_b)$$ where: \begin{align} & M' = \begin{bmatrix} X'_r & X'_g & X'_b \\ 1 & 1 & 1 \\ Z'_r & Z'_g & Z'_b \end{bmatrix}\!\!, \\ \\ & \left. \begin{array}{l} X'_c = x_c / y_c \\ Z'_c = (1 - x_c - y_c) / y_c \end{array} \right\} \textrm{ for each colour } c, \\ \\ & \begin{bmatrix} Y_r \\ Y_g \\ Y_b \end{bmatrix} = M'^{-1} \begin{bmatrix} X_w \\ Y_w \\ Z_w \end{bmatrix} \textrm{ and} \\ \\ & \begin{bmatrix} X_w \\ Y_w \\ Z_w \end{bmatrix} = \begin{bmatrix} x_w / y_w \\ 1 \\ (1 - x_w - y_w) / y_w \end{bmatrix}\!\!. \end{align}

Matrix converting coördinates in the opposite direction is the inverse of $$M,$$ i.e. $$M^{-1}$$.

## Implementation

Having theoretical part covered, it’s time to put the equations into practice, which brings up a question of language to use. With unlimited-precision integers and rational numbers arithmetic already implementation, Python is particularly good choice as it will allow all calculations to be done without rounding. Implementation begins with an overview of the algorithm:

Chromaticity = collections.namedtuple('Chromaticity', 'x y')

def calculate_rgb_matrix(primaries, white):
M_prime = (tuple(c.x / c.y             for c in primaries),
tuple(1                     for _ in primaries),
tuple((1 - c.x - c.y) / c.y for c in primaries))
W = (white.x / white.y, 1, (1 - white.x - white.y) / white.y)
Y = mul_matrix_by_column(inverse_3x3_matrix(M_prime), W)
return mul_matrix_by_diag(M_prime, Y)


The function first constructs $$M'$$ matrix and $$W = [X_w Y_w Z_w]^T$$ column which are used to calculate $$[Y_r Y_g Y_b]^T$$ using $$M'^{-1} W$$ formula. With that computed, the function returns $$M' \times \mathrm{diag}(Y_r, Y_g, Y_b)$$ which is the transform matrix.

All operations on matrices are delegated to separate functions. Since the matrices the code deals with are small there is no need to optimise any of the algorithms and instead the most straightforward matrix multiplication algorithms are chosen:

def mul_matrix_by_column(matrix, column):
return tuple(
sum(row[i] * column[i] for i in range(len(row)))
for row in matrix)

def mul_matrix_by_diag(matrix, column):
return tuple(
tuple(row[c] * column[c] for c in range(len(column)))
for row in matrix)


Only the function inverting a 3-by-3 matrix is somewhat more complex:

def inverse_3x3_matrix(matrix):
def cofactor(row, col):
minor = [matrix[r][c]
for r in (0, 1, 2) if r != row
for c in (0, 1, 2) if c != col]
a, b, c, d = minor
det_minor = a * d - b * c
return det_minor if (row ^ col) & 1 == 0 else -det_minor

comatrix = tuple(
tuple(cofactor(row, col) for col in (0, 1, 2))
for row in (0, 1, 2))
det = sum(matrix[0][col] * comatrix[0][col] for col in (0, 1, 2))
return tuple(
tuple(comatrix[col][row] / det for col in (0, 1, 2))
for row in (0, 1, 2))


It first constructs matrix of cofactors of the input (i.e. comatrix). Because function’s argument is always a 3-by-3 matrix, each minor’s determinant can be trivially calculated using $$\bigl|\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\bigr| = a d - b c$$ formula. Once the comatrix is constructed, calculating determinant of the input matrix and its inverse becomes just a matter of executing a few loops.

The above code works for any RGB system. To get result for sRGB its sRGB’s primaries and white point chromaticities need to me passed:

def calculate_srgb_matrix():
primaries = (Chromaticity(fractions.Fraction(64, 100),
fractions.Fraction(33, 100)),
Chromaticity(fractions.Fraction(30, 100),
fractions.Fraction(60, 100)),
Chromaticity(fractions.Fraction(15, 100),
fractions.Fraction( 6, 100)))
white = Chromaticity(fractions.Fraction(31271, 100000),
fractions.Fraction(32902, 100000))
return calculate_rgb_matrix(primaries, white)


Full implementation with other bells and whistles can be found inside of ansi_colours repository.

## Brussels Day 1 and 2

Day one and two of my stay in Brussels are over. I really enjoyed the discussions I had at the XMPP Standards Foundation Summit which was held in the impressive Cisco office building in Diegem. It’s always nice to meet all the faces behind those ominous nicknames that you only interact with through text chats for the rest of the year. Getting to know them personally is always exciting.

A lot of work has been done to improve the XMPP ecosystem and the protocols that make up its skeleton. For me it was the first time ever to hold a presentation in English, which – in the end – did not turn out as bad as I expected – I guess

I love how highly internationally the XSF Summit and FOSDEM events are. As people from over the world we get together and even though we are working on different projects and systems, we all have very similar goals. It’s refreshing to see a different mind set and hear some different positions and arguments.

I’ve got the feeling that this post is turning into some sort of humanitarian advertisement and sleep is a scarce commodity, so I’m going to bed now to get a snatch.

## Filesystem Familiarisation

I previously noted that accessing filesystems would be a component in my work with microkernel-based systems, and towards the end of last year I began an exercise in developing a simple “toy” filesystem that could hold file-like entities. Combining this with some L4Re-based components that implement seemingly reasonable mechanisms for providing access to files, I was able to write simple test programs that open and access these files.

The starting point for all this was the observation that a normal system file – that is, something stored in the filesystem in my GNU/Linux environment – can be treated like an archive containing multiple files and therefore be regarded as providing a filesystem itself. Such a file can then be embedded in a payload providing a L4Re system by specifying it as a “module” in conf/modules.list for a particular payload entry:

module image_root.fs

Since L4Re provides a rudimentary “rom” filesystem that exposes the modules embedded in the payload, I could open this “toy” filesystem module as a file within L4Re using the normal file access functions.

fp = fopen("rom/image_root.fs", "r");

And with that, I could then use my own functions to access the files stored within. Some additional effort went into exposing file access via interprocess communication, which forms the basis of those mechanisms mentioned above, those mechanisms being needed if such filesystems are to be generally usable in the broader environment rather than by just a single program.

## Preparing Filesystems

The first step in any such work is surely to devise how a filesystem is to be represented. Then, code must be written to access the filesystem, firstly to write files and directories to it, and then to be able to perform the necessary task of reading that file and directory information back out. At some point, an actual filesystem image needs to be prepared, and here it helps a lot if a convenient tool can be developed to speed up testing and further development.

I won’t dwell on the “toy” representation I used, mostly because it was merely chosen to let me explore the mechanisms and interfaces to be provided as L4Re components. The intention was always to switch to a “real world” filesystem and to use that instead. But in order to avoid being overwhelmed with learning about existing filesystems alongside learning about L4Re and developing file access mechanisms, I chose some very simple representations that I thought might resemble “real world” filesystems sufficiently enough to make the exercise realistic.

With the basic proof of concept somewhat validated, my attentions have now turned to “real world” filesystems, and here some interesting observations can be made about tools and libraries. If you were to ask someone about how they might prepare a filesystem, particularly a GNU/Linux user, it would be unsurprising to me if they suggested preparing a file…

dd if=/dev/zero of=image_root.fs bs=1024 count=1 seek=$SIZE_IN_KB …then a filesystem in the file… /sbin/mkfs.ext2 image_root.fs …and then mounting it as follows: sudo mount image_root.fs$MOUNTPOINT

Here, an ext2 filesystem is prepared in a normal system file, and then the operating system is asked to mount the filesystem and to expose it via a mountpoint, this being a directory in the general hierarchy of files and filesystems. But this last step requires special privileges and for the kernel to get involved, and yet all we are doing is accessing a file with the data inside it stored in a particular way. So why is there not a more straightforward, unprivileged way of writing data to that file in the required format?

Indeed, other projects of mine have needed to initialise filesystems, and such mounting operations have been a necessary aspect of those, given the apparent shortage of other methods. It really seemed that filesystems and kernel mechanisms were bound to each other, requiring us to always get the kernel involved. But it turns out that there are other solutions.

## A History Lesson

I am reminded of the mtools suite of programs for accessing floppy disks. Once upon a time, when I was in my first year of university studies, practically all of our class’s programming was performed on a collection of DECstations. Although networked, each of these also provided a floppy drive capable of supporting 2.88MB disks: an uncommon sight, for me at least, with the availability of media and compatibility concerns dictating the use of 720KB and 1.44MB disks instead.

Presumably, within the Ultrix environment we were using, normal users were granted access to the floppy drive when logged in. With a disk inserted, mtools could then be used to access the disk as one big file, interpreting the contents and presenting the user with a view onto files and directories. Of course, mtools exposes a DOS-like interface to the disk, with DOS-like commands providing DOS-like output, and it does not attempt to integrate the contents of a disk within the general Unix filesystem hierarchy.

Indeed, the mechanisms of integrating such foreign data into the general filesystem hierarchy are denied to mere programs, this being a motivation for pursuing alternative operating system architectures like GNU Hurd which support such integration. But the point here is that filesystems – in this example, DOS-based filesystems on floppy disks – can readily be interpreted with the appropriate tools and without “operator” privileges.

## Decoding Filesystem Data

Since filesystems are really just data structures encoded in storage, there should really be no magic involved in decoding and accessing them. After all, the code in the Linux kernel and in other operating system kernels has to do just that, and these things are just programs that happen to run under certain special conditions. So it would make sense if some of the knowledge encoded in these kernels had been extracted and made available as library code for other purposes. After all, it might come in useful elsewhere.

Fortunately, it is likely that such library code is already installed on your system, at least if you are using the ext2 family of filesystems. A search for some common utilities can be informative in this respect. Here is a query being issued for the appropriate filesystem checking utility on a Debian system:

$dpkg -S e2fsck e2fsprogs: /usr/share/man/man5/e2fsck.conf.5.gz e2fsprogs: /sbin/e2fsck e2fsprogs: /usr/share/man/man8/e2fsck.8.gz And for the filesystem initialisation utility mentioned above: $ dpkg -S mkfs.ext2
e2fsprogs: /sbin/mkfs.ext2
e2fsprogs: /usr/share/man/man8/mkfs.ext2.8.gz

The e2fsprogs package itself depends on a package called libext2fs2 – or e2fslibs on earlier distribution versions – and ultimately one discovers that these tools and their libraries are provided by a software distribution, e2fsprogs, whose aim is to provide programs and libraries for general access to the ext2/3/4 filesystem format. So it turns out to be possible and indeed feasible to write programs accessing filesystems without needing to make use of code residing in some kernel or other.

## Tooling Up

Had I bothered to investigate further, I might have discovered another useful package. Running one or both of the following commands on a Debian system lets us see which other packages make use of the library functionality of e2fsprogs:

apt-cache rdepends e2fslibs
apt-cache rdepends libext2fs2

Amongst those listed is e2tools which offers a suite of commands resembling those provided by mtools, albeit with a Unix flavour instead of a DOS flavour. Investigating this, I discovered that these tools inherit somewhat from the utilities provided by e2fsprogs, particularly the debugfs utility.

However, investigating e2fsprogs by myself gave me a chance to become familiar with the details of libext2fs and how the different utilities managed to use it. Since it is not always obvious to me how the library should be used, and I find myself missing some good documentation for it, the more program code I can find to demonstrate its use, the better.

For my purposes, accessing individual files and directories is not particularly interesting: I really just want to treat an ext2 filesystem like an archive when preparing my L4Re payload; it is only within L4Re that I actually want to access individual things. Outside L4Re, having an equivalent to the tar command, but with the output being a filesystem image instead of a tar file, would be most useful for me. For example:

e2archive --create image_root.fs $ROOTFS Currently, this can be made to populate a filesystem for eventual deployment, although the breadth of support for the filesystem features is rather limited. It is possible that I might adopt e2tools as the basis of this archiving program, given that it is merely a shell script that calls another program. Then again, it might be useful to gain direct experience with libext2fs for my other activities. ## Future Directions And so, in the GNU/Linux environment, the creation of such archives has been the focus of my experiments. Meanwhile, I need to develop library functions to support filesystem operations within L4Re, which means writing code to support things like file descriptor abstractions and the appropriate functions for accessing and manipulating files and directories. The basics of some of this is already done for the “toy” filesystem, but it will be a matter of figuring out which libext2fs functions and abstractions need to be used to achieve the same thing for ext2 and its derivatives. Hopefully, once I can demonstrate file access via the same interprocess communications mechanisms, I can then make a start in replacing the existing conventional file access functions with versions that use my mechanisms instead of those provided in L4Re. This will most likely involve work on the C library support in L4Re, which is a daunting prospect, but some familiarity with that is probably beneficial if a more ambitious project to replace the C library is to be undertaken. But if I can just manage to get the dynamic linker to be able to read shared libraries from an ext2 filesystem, then a rather satisfying milestone will have been reached. And this will then motivate work to support storage devices on various hardware platforms of interest, permitting the hosting of filesystems and giving those systems some potential as L4Re-based general-purpose computing devices, too. ### Monday, 28 January 2019 ## WordPress Anti Spam Measures using Fail2ban I recently got really excited when I noticed, that the number of page views on my blog suddenly sky-rocketed from around 70 to over 300! What brought me back down to earth was the fact, that I also received around 120 spam comments on that single day. Luckily all of those were reliably caught by Antispam Bee. Still, it would be nice to have accurate statistics about page views and those stupid spam requests distort the number of views. Also I’d like to fight spam with tooth and nail, so simply filtering out the comments is not enough for me. That’s why I did some research and found out about the plugin WP Fail2Ban Redux, which allows logging of spammed comments for integration with the famous fail2ban tool. The plugin does not come with a settings page, so any settings and options have to be defined in the wp-config.php. In my case it was sufficient to just add the following setting: /path/to/wordpress/wp-config.phpdefine('ANTISPAM_BEE_LOG_FILE', '/var/log/spam.log'); Now, whenever Antispam Bee classifies a comment as spam, the IP of the author is logged in the given log file. Now all I need it to configure fail2ban to read host names from that file and to swing that ban hammer! /etc/fail2ban/filter.d/antispambee.conf[INCLUDES] # Read common prefixes. If any customizations available -- read them from# common.local before = common.conf [Definition] _daemon = wp failregex = ^%(__prefix_line)s comment for post.* from host= marked as spam$
/etc/fail2ban/jail.local [antispambee] enabled = true filter = antispambee logpath = /var/log/spam.log bantime = 21600 maxretry = 1 port = http,https

Now whenever a spammer leaves a “comment” on my blog, its IP is written in the spam.log file where it is picked up by fail2ban, which results in a 6 hour ban for that IP.

Update:

## Some Ideas for 2019

Well, after my last article moaning about having wishes and goals while ignoring the preconditions for, and contributing factors in, the realisation of such wishes and goals, I thought I might as well be constructive and post some ideas I could imagine working on this year. It would be a bonus to get paid to work on such things, but I don’t hold out too much hope in that regard.

In a way, this is to make up for not writing an article summarising what I managed to look at in 2018. But then again, it can be a bit wearing to have to read through people’s catalogues of work even if I do try and make my own more approachable and not just list tons of work items, which is what one tends to see on a monthly basis in other channels.

In any case, 2018 saw a fair amount of personal focus on the L4Re ecosystem, as one can tell from looking at my article history. Having dabbled with L4Re and Fiasco.OC a bit in 2017 with the MIPS Creator CI20, I finally confronted certain aspects of the software and got it working on various devices, which had been something of an ambition for at least a couple of years. I also got back into looking at PIC32 hardware and software experiments, tidying up and building on earlier work, and I keep nudging along my Python-like language and toolchain, Lichen.

Anyway, here are a few ideas I have been having for supporting a general strategy of building flexible, sustainable and secure computing environments that respect the end-user. Such respect not being limited to software freedom, but also extending to things like privacy, affordability and longevity that are often disregarded in the narrow focus on only one set of end-user rights.

## Building a General-Purpose System with L4Re

Apart from writing unfinished articles about supporting hardware devices on the Ben NanoNote and Letux 400, I also spent some time last year considering the mechanisms supporting filesystems in L4Re. For an outsider like myself, the picture isn’t particularly clear, but the mechanisms don’t really seem particularly well documented, and I am not convinced that the filesystem support is what people might expect from a microkernel-based system.

Like L4Re’s device support, the way filesystems are made available to tasks appears to use libraries extensively, whereas I would expect more use of individual programs, with interprocess messages and shared memory being employed to move the data around. To evaluate my expectations, I have been writing programs that operate in such a way, employing a “toy” filesystem to test the viability of such an approach. The plan is to make use of libext2fs to expose ext2/3/4 filesystems to L4Re components, then to try and replace the existing file access mechanisms with ones that access these file-serving components.

It is unfortunate that systems like these no longer seem to get much attention from the wider Free Software community. There was once a project to port GNU Hurd to L4-family microkernels, but with the state of the art having seemingly been regarded as insufficient or inappropriate, the focus drifted off onto other things, and now there doesn’t seem to be much appetite to resume such work. Even the existing Hurd implementation doesn’t get that much interest these days, either. And yet there are plenty of technical, social and practical reasons for not just settling for using the Linux kernel and systems based on it for every last application and device.

## Extending Hardware Support within L4Re

It is all very well developing filesystem support, but there also has to be support for the things that provide the storage on which those filesystems reside. This is something I didn’t bother to look at when getting L4Re working on various devices because the priority was to have something to show, meaning that the display had to work, along with testing and demonstrating other well-understood peripherals, with the keyboard or keypad being something that could be supported with relative ease and then used to demonstrate other aspects of the solution.

It seems perverse that one must implement support for SD or microSD card storage all over again when the software being run is already being loaded from such storage, but this is rather like the way that “live CD” versions of GNU/Linux would load an environment directly from a CD, yet an installed version of such an environment might not have the capability to access the CD drive. Still, this is an unavoidable step along the path to make a practical system.

It might also be nice to make the CI20 support a bit better. Although a device notionally supported by L4Re, various missing pieces of hardware support needed to be added, and the HDMI output capability remains unavailable. Here, the mystery hardware left undocumented by the datasheet happens to be used in other chipsets and has been supported in the Linux kernel for many of them for quite some time. Hopefully, the exercise will not be too frustrating.

Another device that might be a good candidate for L4Re is the Efika MX Smartbook. Although having a modest specification by today’s bloated-Web and pointless-eye-candy standards, it has a nice keyboard (with a more sensible layout than the irritating HP Elitebook, as I recently discovered) and is several times more powerful than the Letux 400. My brother, David, has already investigated certain aspects of the hardware, and it might make the basis of a nice portable system. And since support in Linux and other more commonly-used technologies has been left to rot, why not look into developing a more lasting alternative?

## Reviving Mail-Based Communication

It is tiresome to see the continuing neglect of the health of e-mail, despite it being used as the bedrock of the Internet’s activities, while proprietary and predatory social media platforms enjoy undeserved attention and promotion in mass media and in society at large. Governmental and corporate negligence mean that the average person is subjected to an array of undesirable, disturbing and generally unwanted communications from chancers and criminals through their e-mail accounts which, if it had ever happened to the same degree with postal mail, would have seen people routinely arrested and convicted for their activities.

It is tempting to think that “they” do not want “us” to have a more reliable form of mail-based communication because that would involve things like encryption and digital signatures. Even when these things are deemed necessary, they always seem to be rolled out as part of a special service that hosts “our” encryption and signing keys for us, through which we must go to access our messages. It is, I suppose, yet another example of the abandonment of common infrastructure: that when something better is needed, effort and attention is siphoned off from the “commons” and ploughed into something separate that might make someone a bit of money.

There are certainly challenges involved in making e-mail better, with any discussion of managing identities, vouching for and recognising correspondents, and the management of keys most likely to lead to dispute about the “best” way of doing things. But in the end, we probably find ourselves pursuing perfect solutions that do everything whilst proprietary competitors just focus on doing a handful of things effectively. So I envisage turning this around and evaluating whether a more limited form of mail-based communication can be done in the way that most people would need.

I did look fairly recently at a selection of different projects seeking to advise and support people on providing their own e-mail infrastructure. That is perhaps worth an article in its own right. And on the subject of mail-based communication, I hope to look a bit more at imip-agent again after neglecting it for so long.

## Sustaining a Python Alternative

One motivation for developing my Python-like language and toolchain, Lichen, was to explore ways in which Python might have been developed to better suit my own needs and preferences. Although I still use Python extensively, I remain mindful of the need to write conservative, uncomplicated code without the kind of needless tricks and flourishes that Python’s expanding feature set can tempt the developer with, and thus I almost always consider the possibility of being able to use the Lichen toolchain with my projects one day.

Lichen may still be a proof of concept, but there has been work done on gradually pushing it towards being genuinely usable. I spent some time considering the way floating point numbers might be represented, and this raised some interesting issues around how they might be stored within instances. Like the tuple performance optimisations, I hope to introduce floating point support into the established feature set of Lichen and hopefully offer decent-enough performance, with the latter aspect being yet another path of investigation this year.

## Documenting and Publishing

A continuing worry I have is whether I should still be running MoinMoin sites or even sites derived from MoinMoin “export dumps” for published information that is merely static. I have spent some time developing a parsing and formatting tool to generate static content from Moin content, thus avoiding running Moin altogether and also avoiding having to run a script acting as a URL-preserving front-end to exported Moin content (which is unfortunately required because of how the “export dump” seems to work).

Currently, this tool supports HTML and Moin output, the latter to test the parsing activity, with Graphviz content rendered as inline SVG or in other supported formats (although inline SVG is really what you want). Some macros are supported, but I need to add support for others, like the search macros which are useful for generating page listings. Themes are also supported, but I need to make sure that things like tables of contents – already supported with a macro – can be styled appropriately.

Already, I can generate the bulk of my existing project documentation, and the aim here is to be able to focus on improving that documentation, particularly for things like Lichen that really need explanations to be written before I need to start reviewing the code from scratch as if I were a total newcomer to the work. I have also considered using this tool as the basis for a decentralised wiki solution, but that can probably wait for a while given how many other things I have said I want to do!

## Anything More?

There are probably other things that are worth looking at, but these are perhaps the ones I feel are most pressing. It could be said that pursuing all these at once would spread me and my efforts very thin, but I tend to cycle through projects periodically and hope that I can catch up with what I was previously doing, hence the mention above of documenting my work.

I wonder how much other people think about the year ahead, whether it is a productive and ultimately rewarding exercise to state aspirations and goals in this kind of way. New Year’s resolutions are a familiar practice, of course, but here I make no promises!

Nevertheless, a belated Happy New Year to anyone still reading! I hope we can all pursue our objectives enthusiastically over the year ahead and make a real and positive difference to computing, its users and to our societies.

## Using Terraform and cloud-init on Hetzner

Nowadays with the help of modern tools, we use our infrastructure as code. This approach is very useful because we can have Immutable design with our infra by declaring the state would like our infra to be. This also provide us with flexibility and a more generic way on how to handle our infra as lego bricks, especially on scaling.

UPDATE: 2019.01.22

## Hetzner

We need to create an Access API Token within a new project under the console of hetzner cloud.

Copy this token and with that in place we can continue with terraform.
For the purposes of this article, I am going to use as the API token: 01234567890

## Install Terraform

the latest terraform version at the time of writing this blog post is: v.11.11

$curl -sL https://releases.hashicorp.com/terraform/0.11.11/terraform_0.11.11_linux_amd64.zip | bsdtar -xf- && chmod +x terraform$ sudo mv terraform /usr/local/bin/

and verify it

$terraform version Terraform v0.11.11 ## Terraform Provider for Hetzner Cloud To use the hetzner cloud via terraform, we need the terraform-provider-hcloud plugin. hcloud, is part of terraform providers repository. So the first time of initialize our project, terraform will download this plugin locally. Initializing provider plugins... - Checking for available provider plugins on https://releases.hashicorp.com... - Downloading plugin for provider "hcloud" (1.7.0)... ... * provider.hcloud: version = "~> 1.7" ### Compile hcloud If you like, you can always build hcloud from the source code. There are notes on how to build the plugin here Terraform Hetzner Cloud provider. ### GitLab CI or you can even download the artifact from my gitlab-ci repo. ### Plugin directory You will find the terraform hcloud plugin under your current directory: ./.terraform/plugins/linux_amd64/terraform-provider-hcloud_v1.7.0_x4 I prefer to copy the tf plugins centralized under my home directory: $ mkdir -pv ~/.terraform/plugins/linux_amd64/
$mv ./.terraform/plugins/linux_amd64/terraform-provider-hcloud_v1.7.0_x4 ~/.terraform.d/plugins/linux_amd64/terraform-provider-hcloud or if you choose the artifact from gitlab: $ curl -sL -o ~/.terraform/plugins/linux_amd64/terraform-provider-hcloud https://gitlab.com/ebal/terraform-provider-hcloud-ci/-/jobs/artifacts/master/raw/bin/terraform-provider-hcloud?job=run-build

That said, when working with multiple terraform projects you may be in a position that you need different versions of the same tf-plugin. In that case it is better to have them under your current working directory/project instead of your home directory. Perhaps one project needs v1.2.3 and another v4.5.6 of the same tf-plugin.

## Hetzner Cloud API

Here is a few examples on how to use the Hetzner Cloud API:

$export -p API_TOKEN="01234567890" $ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/datacenters | jq -r .datacenters[].name fsn1-dc8 nbg1-dc3 hel1-dc2 fsn1-dc14 $ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/locations | jq -r .locations[].name fsn1 nbg1 hel1 $ curl -sH "Authorization: Bearer $API_TOKEN" https://api.hetzner.cloud/v1/images | jq -r .images[].name ubuntu-16.04 debian-9 centos-7 fedora-27 ubuntu-18.04 fedora-28 ## hetzner.tf At this point, we are ready to write our terraform file. It can be as simple as this (CentOS 7): # Set the variable value in *.tfvars file # or using -var="hcloud_token=..." CLI option variable "hcloud_token" {} # Configure the Hetzner Cloud Provider provider "hcloud" { token = "${var.hcloud_token}"
}

# Create a new server running centos
resource "hcloud_server" "node1" {
name = "node1"
image = "centos-7"
server_type = "cx11"
}

### Project_Ebal

or a more complex config: Ubuntu 18.04 LTS

# Project_Ebal
variable "hcloud_token" {}

# Configure the Hetzner Cloud Provider
provider "hcloud" {
token = "${var.hcloud_token}" } # Create a new server running centos resource "hcloud_server" "Project_Ebal" { name = "ebal_project" image = "ubuntu-18.04" server_type = "cx11" location = "nbg1" } ### Repository Structure Although in this blog post we have a small and simple example of using hetzner cloud with terraform, on larger projects is usually best to have separated terraform files for variables, code and output. For more info, you can take a look here: VCS Repository Structure - Workspaces  ├── variables.tf ├── main.tf ├── outputs.tf ### Cloud-init To use cloud-init with hetzner is very simple. We just need to add this declaration user_data = "${file("user-data.yml")}" to terraform file.
So our previous tf is now this:

# Project_Ebal
variable "hcloud_token" {}

# Configure the Hetzner Cloud Provider
provider "hcloud" {
token = "${var.hcloud_token}" } # Create a new server running centos resource "hcloud_server" "Project_Ebal" { name = "ebal_project" image = "ubuntu-18.04" server_type = "cx11" location = "nbg1" user_data = "${file("user-data.yml")}"
}

to get the IP_Address of the virtual machine, I would also like to have an output declaration:

output "ipv4_address" {
value = "${hcloud_server.ebal_project.ipv4_address}" } ## Clout-init You will find more notes on cloud-init on a previous blog post: Cloud-init with CentOS 7. below is an example of user-data.yml #cloud-config disable_root: true ssh_pwauth: no users: - name: ubuntu ssh_import_id: - gh:ebal shell: /bin/bash sudo: ALL=(ALL) NOPASSWD:ALL # Set TimeZone timezone: Europe/Athens # Install packages packages: - mlocate - vim - figlet # Update/Upgrade & Reboot if necessary package_update: true package_upgrade: true package_reboot_if_required: true # Remove cloud-init runcmd: - figlet Project_Ebal > /etc/motd - updatedb ## Terraform First thing with terraform is to initialize our environment. ### Init $ terraform init

Initializing provider plugins...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

### Plan

Of course it is not necessary to plan and then plan with out.
You can skip this step, here exist only for documentation purposes.

$terraform plan  Refreshing Terraform state in-memory prior to plan... The refreshed state will be used to calculate this plan, but will not be persisted to local or remote state storage. ------------------------------------------------------------------------ An execution plan has been generated and is shown below. Resource actions are indicated with the following symbols: + create Terraform will perform the following actions: + hcloud_server.ebal_project id: <computed> backup_window: <computed> backups: "false" datacenter: <computed> image: "ubuntu-18.04" ipv4_address: <computed> ipv6_address: <computed> ipv6_network: <computed> keep_disk: "false" location: "nbg1" name: "ebal_project" server_type: "cx11" status: <computed> user_data: "sk6134s+ys+wVdGITc+zWhbONYw=" Plan: 1 to add, 0 to change, 0 to destroy. ------------------------------------------------------------------------ Note: You didn't specify an "-out" parameter to save this plan, so Terraform can't guarantee that exactly these actions will be performed if "terraform apply" is subsequently run. ### Out $ terraform plan -out terraform.tfplan


Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

+ hcloud_server.ebal_project
id:            <computed>
backup_window: <computed>
backups:       "false"
datacenter:    <computed>
image:         "ubuntu-18.04"
ipv6_network:  <computed>
keep_disk:     "false"
location:      "nbg1"
name:          "ebal_project"
server_type:   "cx11"
status:        <computed>
user_data:     "sk6134s+ys+wVdGITc+zWhbONYw="

Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: terraform.tfplan

To perform exactly these actions, run the following command to apply:
terraform apply "terraform.tfplan"

$terraform apply "terraform.tfplan" hcloud_server.ebal_project: Creating... backup_window: "" => "<computed>" backups: "" => "false" datacenter: "" => "<computed>" image: "" => "ubuntu-18.04" ipv4_address: "" => "<computed>" ipv6_address: "" => "<computed>" ipv6_network: "" => "<computed>" keep_disk: "" => "false" location: "" => "nbg1" name: "" => "ebal_project" server_type: "" => "cx11" status: "" => "<computed>" user_data: "" => "sk6134s+ys+wVdGITc+zWhbONYw=" hcloud_server.ebal_project: Still creating... (10s elapsed) hcloud_server.ebal_project: Still creating... (20s elapsed) hcloud_server.ebal_project: Creation complete after 23s (ID: 1676988) Apply complete! Resources: 1 added, 0 changed, 0 destroyed. Outputs: ipv4_address = 1.2.3.4 ## SSH and verify cloud-init $ ssh 1.2.3.4 -l ubuntu

Welcome to Ubuntu 18.04.1 LTS (GNU/Linux 4.15.0-43-generic x86_64)

* Documentation:  https://help.ubuntu.com
* Management:     https://landscape.canonical.com

System information as of Fri Jan 18 12:17:14 EET 2019

Usage of /:   9.7% of 18.72GB   Users logged in:     0
Memory usage: 8%                IP address for eth0: 1.2.3.4
Swap usage:   0%

0 packages can be updated.


## Destroy

Be Careful without providing a specific terraform out plan, terraform will destroy every tfplan within your working directory/project. So it is always a good practice to explicit destroy a specify resource/tfplan.

$terraform destroy should better be: $ terraform destroy -out terraform.tfplan

hcloud_server.ebal_project: Refreshing state... (ID: 1676988)

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
- destroy

Terraform will perform the following actions:

- hcloud_server.ebal_project

Plan: 0 to add, 0 to change, 1 to destroy.

Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.

Enter a value: yes

hcloud_server.ebal_project: Destroying... (ID: 1676988)
hcloud_server.ebal_project: Destruction complete after 1s

Destroy complete! Resources: 1 destroyed.

That’s it !

### Sunday, 20 January 2019

I’ve recently obtained the official “Cerfified Kubernetes Administrator“, some people write me on linkedin asking opinions about how to pass the exam, for this reason i’ve decied to share my experience and write down some advices and userful resources. Every CKA’s candidate must sign an NDA, so i can’t spread specific details.

If you don’t have specific knowledge about container tecnologies and docker, read the O’Reilly’s book “Docker: Up & Running“, the exam doesn’t require specific knowledge about docker but, since Kubernetes is an orchestrator for containers, this is a needed base.

IMHO the best book about k8s is “Kubernetes in Action” , it covers a lot of aspect about this software and it’s very detailed, you don’t need to read it from the beginning till the end, some chapters of the “part 3” are not required by the exam.

In parallel of your readings, you need to do practice, so open an account on a cloud provider that provides managed kubernete’s clusters, by now there’s a lot of providers that support it, the one that it’s considered the best about k8s is Google Cloud Platform. Sign in an account, insert your credit card and you will get free 300$to be used within 1 year. First make practice with clusters already created by provider and managed with GKE, you need to understand well how to work with Pod, Deployment, Service, ConfigMap ecc… you need to get very comfortable with the command lineÂ kubectl since this is the k8s’s principal CLI. After that period, learn how to install a cluster: create 3 vm (you can use the g1-small with preemptible feature so you will consume small credit), connect via ssh and create cluster formed by 1 master and 2 worker with kubeadm. Connect to the servers, check the k8s’s systemd units, where the logs are located and …. broke it! a part of the exam will be about how to do debugging on a broken cluster. After you have fixed the cluseter, destroy the virtual machines, create other 3 machines and re-broke it on another way! During the exam you will be allowed to use only the official documentation, so during your tests get accustomed to use https://kubernetes.io/docs , avoid to use StackOverflow&c. The environment cka-practice-environment is a good test, you can easily use it with docker-compose. The advices in this document are useful, and this “commented curriculum” is useful as a recap before the exam. At the end go through “Kubernetes the Hard Way” to learn how to configure by hand a cluster and understand the details of every component. When you are ready, schedule the exam. It’s a practice exam that will last 3 hours max, as every Linux Foundation/CNCF’s exam it’s a remote one, you need a computer with Chrome and a particular extension installed, this extension will share your webcam, desktop and the microphone. When the exam will start, you get a Linux console with kubectl already configured with 6 clusters, before every question will be written the kubectl config use-context command to connect to the correct cluster. T all’inizio di ognuna delle 24 domande vi verrĂ indicato su quale cluster dovete agire tramite kubectl config use-context. The exam is formed by 24 questions, a minimum score of 74% is required to pass it, the three hours are enough to complete all the exercises, the exam is rigorous but can be passed by anyone who has prepared with commitment. Good luck! ### Friday, 18 January 2019 ## Quick Note: Backdoor in ES File Explorer ES File Explorer is a popular file explorer app for Android. Even though it is proprietary, I must admit, that I came in touch with it too some years ago. As Techcrunch reports, a security researcher now detected a backdoor in the app, which allows users on the same local area network as the victim to access the contents of the phone. This example shows, how important it is to have free software, which can be audited by everyone. ### Thursday, 17 January 2019 ## Unified Encrypted Payload Elements for XMPP Requirements on encryption change from time to time. New technologies pop up and crypto protocols get replaced by new ones. There are also different use-cases that require different encryption techniques. For that reason there is a number of encryption protocols specified for XMPP, amongst them OMEMO and OpenPGP for XMPP. Most crypto protocols share in common, that they all aim at encrypting certain parts of the message that is being sent, so that only the recipient(s) can read the encrypted content. OMEMO is currently only capable to encrypt the messages body. For that reason the body of the message is being encrypted and stored in a <payload/> element, which is added to the message. This is inconvenient, as it makes OMEMO quite inflexible. The protocol cannot be used to secure arbitrary extension elements, which might contain sensitive content as well. <message to='juliet@capulet.lit' from='romeo@montague.lit' id='send1'> <encrypted xmlns='eu.siacs.conversations.axolotl'> <header>...</header> <!-- the payload contains the encrypted content of the body --> <payload>BASE64ENCODED</payload> </encrypted> </message> The modern OpenPGP for XMPP XEP also uses <payload/> elements, but to transport arbitrary extension elements. The difference is, that in OpenPGP, the payload elements contain the actual payload as plaintext. Those <payload/> elements are embedded in either a <crypt/> or <signcrypt/> element, depending on whether or not the message will be signed and then passed through OpenPGP encryption. The resulting ciphertext is then appended to the message element in form of a <openpgp/> element. <signcrypt xmlns='urn:xmpp:openpgp:0'> <to jid='juliet@example.org'/> <time stamp='...'/> <rpad>...</rpad> <payload> <body xmlns='jabber:client'> This is a secret message. </body> </payload> </signcrypt> <!-- The above element is passed to OpenPGP and the resulting ciphertext is included in the actual message as an <openpgp/> element --> <message to='juliet@example.org'> <openpgp xmlns='urn:xmpp:openpgp:0'> BASE64_OPENPGP_MESSAGE </openpgp> </message> Upon receiving a message containing an <openpgp/> element, the receiver decrypts the content of it, does some verity checks and then replaces the <openpgp/> element of the message with the extension elements contained in the <payload/> element. That way the original, unencrypted message is constructed. The benefit of this technique is that the <payload/> element can in fact contain any number of arbitrary extension elements. This makes OpenPGP for XMPPs take on encrypting message content way more flexible. A logical next step would be to take OpenPGP for XMPPs <payload/> elements and move them to a new XEP, which specifies their use in a unified way. This can then be used by OMEMO and any other encryption protocol as well. The motivation behind this is, that it would broaden the scope of encryption to cover more parts of the message, like read markers and other metadata. It could also become easier to implement end-to-end encryption in other scenarios such as Jingle file transfer. Even though there is Jingle Encrypted Transports, this protocol only protects the stream itself and leaves the metadata such as filename, size etc. in the clear. A unified <encrypted/> element would make it easier to encrypt such metadata and could be the better approach to the problem. ### Wednesday, 16 January 2019 ## A Solution for Authoritative DNS I’ve been thinking about improving my DNS setup. So many things will use e-mail verification as a backup authentication measure that it is starting to show as a real weak point. An Ars Technica article earlier this year talked about how “[f]ederal authorities and private researchers are alerting companies to a wave of domain hijacking attacks that use relatively novel techniques to compromise targets at an almost unprecedented scale.” The two attacks that are mentioned in that article, changing the nameserver and changing records, are something that DNSSEC could protect against. Records wouldn’t have to be changed on my chosen nameservers, a BGP-hijacking could just give another server the queries for records on my domain instead and then reply with whatever it chooses. After thinking for a while, my requirements come down to: • Offline DNSSEC signing • Support for storing signing keys on a HSM (YubiKey) • Version control • No requirement to run any Internet-facing infrastructure myself After some searching I discovered GooDNS, a “good” DNS hosting provider. They have an interesting setup that looks to fit all of my requirements. If you’re coming from a more traditional arrangement with either a self-hosted name server or a web panel then this might seem weird, but if you’ve done a little “infrastructure as code” then maybe it is not so weird. The inital setup must be completed via the web interface. You’ll need to have an hardware security module (HSM) for providing a time based one time password (TOTP), an SSH key and optionally a GPG key as part of the registration. You will need the TOTP to make any changes via the web interface, the SSH key will be used to interact with the git service, and the GPG key will be used for any email correspondance including recovery in the case that you lose your TOTP HSM or password. You must validate your domain before it will be served from the GooDNS servers. There are two options for this, one for new domains and one “zero-downtime” option that is more complex but may be desirable if your domain is already live. For new domains you can simply update your nameservers at the registrar to validate your domain, for existing domains you can add a TXT record to the current DNS setup that will be validated by GooDNS to allow for the domain to be configured fully before switching the nameservers. Once the domain is validated, you will not need to use the web interface again unless updating contact, security or billing details. All the DNS configuration is managed in a single git repository. There are three branches in the repository: “master”, “staging” and “production”. These are just the default branches, you can create other branches if you like. The only two that GooDNS will use are the “staging” and “production” branches. GooDNS provides a script that you can install at /usr/local/bin/git-dns (or elsewhere in your path) which provides some simple helper commands for working with the git repository. The script is extremely readable and so it’s easy enough to understand and write your own scripts if you find yourself needing something a little different. When you clone your git repository you’ll find one text file on the master branch for each of your configured zones: irl@computer$ git clone git@goodns.net:irl.git
Cloning into 'irl1'...
remote: Enumerating objects: 3, done.
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 3
Receiving objects: 100% (3/3), 22.55 KiB | 11.28 MiB/s, done.
Resolving deltas: 100% (1/1), done.
irl@computer$ls irl1.net learmonth.me irl@computer$ cat irl1.net
@ IN SOA ns1.irl1.net. hostmaster.irl1.net. (
_SERIAL_
28800
7200
864000
86400
)

@           IN      NS        ns1.goodns.net.
@           IN      NS        ns2.goodns.net.
@           IN      NS        ns3.goodns.net.


In the backend GooDNS is using OpenBSD 6.4 servers with nsd(8). This means that the zone files use the same syntax. If you don’t know what this means then that is fine as the documentation has loads of examples in it that should help you to configure all the record types you might need. If a record type is not yet supported by nsd(8), you can always specify the record manually and it will work just fine.

One thing you might note here is that the string _SERIAL_ appears instead of a serial number. The git-dns script will replace this with a serial number when you are ready to publish the zone file.

I’ll assume that you already have you GPG key and SSH key set up, now let’s set up the DNSSEC signing key. For this, we will use one of the four slots of the YubiKey. You could use either 9a or 9e, but here I’ll use 9e as 9a is already the SSH key for me.

To set up the token, we will need the yubico-piv-tool. Be extremely careful when following these steps especially if you are using a production device. Try to understand the commands before pasting them into the terminal.

First, make sure the slot is empty. You should get an output similar to the following one:

irl@computer$yubico-piv-tool -s 9e -a status CHUID: ... CCC: No data available PIN tries left: 10  Now we will use git-dns to create our key signing key (KSK): irl@computer$ git dns kskinit --yubikey-neo
Successfully generated a new private key.
Successfully generated a new self signed certificate.
Found YubiKey NEO.
Slots available:
(1) 9a - Not empty
(2) 9e - Empty
Which slot to use for DNSSEC signing key? 2
Successfully imported a new certificate.
CHUID:  ...
CCC:    No data available
Slot 9e:
Algorithm:  ECCP256
Subject DN: CN=irl1.net
Issuer DN:  CN=irl1.net
Fingerprint:    97dda8a441a401102328ab6ed4483f08bc3b4e4c91abee8a6e144a6bb07a674c
Not Before: Feb 01 13:10:10 2019 GMT
Not After:  Feb 01 13:10:10 2021 GMT
PIN tries left: 10


We can see the public key for this new KSK:

irl@computer$git dns pubkeys irl1.net. DNSKEY 256 3 13 UgGYfiNse1qT4GIojG0VGcHByLWqByiafQ8Yt7/Eit2hCPYYcyiE+TX8HP8al/SzCnaA8nOpAkqFgPCI26ydqw==  Next we will create a zone signing key (ZSK). These are stored in the keys/ folder of your git repository but are not version controlled. You can optionally encrypt these with GnuPG (and so requiring the YubiKey to sign zones) but I’ve not done that here. Operations using slot 9e do not require the PIN so leaving the YubiKey connected to the computer is pretty much the same as leaving the KSK on the disk. Maybe a future YubiKey will not have this restriction or will add more slots. irl@computer$ git dns zskinit
Created ./keys/
Successfully generated a new private key.
irl@computer$git dns pubkeys irl1.net. DNSKEY 256 3 13 UgGYfiNse1qT4GIojG0VGcHByLWqByiafQ8Yt7/Eit2hCPYYcyiE+TX8HP8al/SzCnaA8nOpAkqFgPCI26ydqw= irl1.net. DNSKEY 257 3 13 kS7DoH7fxDsuH8o1vkvNkRcMRfTbhLqAZdaT2SRdxjRwZSCThxxpZ3S750anoPHV048FFpDrS8Jof08D2Gqj9w==  Now we can go to our domain registrar and add DS records to the registry for our domain using the public keys. First though, we should actually sign the zone. To create a signed zone: irl@computer$ git dns signall
Signing irl1.net...
Signing learmonth.me...
[production 51da0f0] Signed all zone files at 2019-02-01 13:28:02
2 files changed, 6 insertions(+), 0 deletions(-)


You’ll notice that all the zones were signed although we only created one set of keys. Set ups where you have one shared KSK and individual ZSK per zone are possible but they provide questionable additional security. Reducing the number of keys required for DNSSEC helps to keep them all under control.

To make these changes live, all that is needed is to push the production branch. To keep things tidy, and to keep a backup of your sources, you can push the master branch too. git-dns provides a helper function for this:

irl@computer$git dns push Pushing master...done Pushing production...done Pushing staging...done  If I now edit a zone file on the master branch and want to try out the zone before making it live, all I need to do is: irl@computer$ git dns signall --staging
Signing irl1.net...
Signing learmonth.me...
[staging 72ea1fc] Signed all zone files at 2019-02-01 13:30:12
2 files changed, 8 insertions(+), 0 deletions(-)
irl@computer\$ git dns push
Pushing master...done
Pushing production...done
Pushing staging...done


If I now use the staging resolver or lookup records at irl1.net.staging.goodns.net then I’ll see the zone live. The staging resolver is a really cool idea for development and testing. They give you a couple of unique IPv6 addresses just for you that will serve your staging zone files and act as a resolver for everything else. You just have to plug these into your staging environment and everything is ready to go. In the future they are planning to allow you to have more than one staging environment too.

All that is left to do is ensure that your zone signatures stay fresh. This is easy to achieve with a cron job:

0 3 * * * /usr/local/bin/git-dns cron --repository=/srv/dns/irl1.net --quiet


I monitor the records independently and disable the mail output from this command but you might want to drop the --quiet if you’d like to get mails from cron on errors/warnings.

On the GooDNS blog they talk about adding an Onion service for the git server in the future so that they do not have logs that could show the location of your DNSSEC signing keys, which allows you to have even greater protection. They already support performing the git push via Tor but the addition of the Onion service would make it faster and more reliable.

Unfortunately, GooDNS is entirely fictional and you can’t actually manage your DNS in this way, but wouldn’t it be nice? This post has drawn inspiration from the following:

## An Absence of Strategy?

I keep starting articles but not finishing them. However, after responding to some correspondence recently, where I got into a minor rant about a particular topic, I thought about starting this article and more or less airing the rant for a wider audience. I don’t intend to be negative here, so even if this sounds like me having a moan about how things are, I really do want to see positive and constructive things happen to remedy what I see as deficiencies in the way people go about promoting and supporting Free Software.

The original topic of the correspondence was my brother’s article about submitting “apps” to F-Droid, the Free Software application repository for Android, which somehow got misattributed to me in the FSFE newsletter. As anyone who knows both of us can imagine, it is not particularly unusual that people mix us up, but it does still surprise me how people can be fluid about other people’s names and assume that two people with the same family name are the same person.

Eventually, the correction was made, for which I am grateful, and it must be said that I do also appreciate the effort that goes into writing the newsletter. Having previously had the task of doing some of the Fellowship interviews, I know that such things require more work than people might think, largely go either unnoticed or unremarked, and as a participant in the process it can be easy to wonder afterwards if it was worth the bother. I do actually follow the FSFE Planet and the discussion mailing list, so I’d like to think that I keep up with what other people do, but the newsletter must have some value to those who don’t want to follow a range of channels.

## A Rant about Free Software on Mobile

Well, it wasn’t as much of a rant as it was a moan about how there doesn’t seem to be a coherent strategy about Free Software on mobile devices. The FSFE has had some kind of campaign about Android for quite some time. What it amounts to is promoting Free Software applications and Free Software distributions on phones.

This probably isn’t significantly different from the activities promoted by the FSF, whose Defective By Design campaign features a gift guide promoting phones that run Free Software. The FSF also promotes and funds the Replicant project, more of which below.

For all I know, the situation about getting Free Software applications onto a phone probably isn’t all that dire, assuming that Google and phone vendors don’t try and prevent users from installing software that isn’t delivered via Google Play or other officially-sanctioned channels. Or as the Android documentation puts it:

Of course, this is rather reminiscent of the “bad old days” where some people could copy things on and off their phone using Bluetooth (or for those with particularly long memories, infrared communication) whereas others had those features disabled by their provider. So, while some people get to enjoy the benefits of Free Software, others are denied them: another case of divide-and-rule in action, I suppose.

But it is the situation about Free Software distributions, more specifically having a Free Software operating system with Free Software device drivers and Free Software firmware, that is most worrying. The FSFE campaign points people to the two enduring initiatives for putting a different kind of Android distribution on phones: Replicant and LineageOS (previously CyanogenMod).

While LineageOS seems to try and support as many devices as possible, it relies on non-free software to support device hardware. Meanwhile, Replicant employs only Free Software and is therefore limited in which devices it may support and to what extent those device’s features will function.

Although I can’t really muster much enthusiasm for Android and its derivatives, I don’t think there is anything wrong with trying to provide a completely Free Software distribution of that software. Certainly, there will always be challenges with the evolution of the upstream code, being steered this way and that by its corporate parent for maximum corporate benefit, but this isn’t really much different to clinging on to the pace of change (and churn) in an openly-developed project like the Linux kernel.

But ultimately, these initiatives will always be reacting to what other people, specifically those working for large companies, have decided to do. It will always be about chasing the latest release of the upstream software and making it acceptable for a Free Software audience. And it will always be about seeing whether the latest devices can be supported or not and then trying to figure out how. And this is where most people start to wonder why things always have to be like this, spurring my rant.

## For the Long Term

To be fair to the FSFE’s Android-related campaign, the advice given does give people some concrete activities to consider: it isn’t simply the “go out and write code” battle cry that sometimes drifts through the air after an acrimonious episode where nobody can agree with each other. Helping F-Droid get more applications published, writing more Free Software applications, helping the operating system projects with their efforts: these are all worthwhile things for people to do.

But we only need look at the FSF’s ethical gift guide to see the severe challenges being faced over the longer term. For yet another year, the only offerings are older, refurbished Samsung smartphones, the most recent being the Galaxy S3 and Galaxy Note 2 from 2012. Now, there is nothing wrong with older hardware or refurbished devices. After all, I have written about older and much less powerful hardware than that which I believe still should have a place in modern life.

Nor should people regard the price of such refurbished units as particularly expensive. Of course you can buy a new phone with better specifications for the same or even less, but that new phone won’t be running only Free Software. Yes, there is always the Fairphone whose creators’ grip on Free Software, software longevity and other matters that weren’t confronted in the beginning is supposedly now rather better, although the hardware drivers remain non-free, so it isn’t really comparable, either.

Here, it is illustrative to consider community-originating efforts to develop smartphones, particularly since there is a perception that such efforts eventually end up pitching “expensive” and “underpowered” devices that “aren’t competitive”. There are obviously a collection of such initiatives ongoing at any given point, but ignoring random crowdfunding campaigns and corporate publicity stunts, we might usefully focus on some more familiar projects: the GTA04 successor to the Openmoko FreeRunner and the Neo900 successor to the Nokia N900.

Both of these projects are in a not-easily-explained state. The GTA04 device was made in a number of incrementally refined versions and sold primarily to people who already had a FreeRunner into which they could install the new hardware. However, difficulties with the last hardware revision meant that it was no longer viable as a product, with the cost of overcoming production problems being rather likely to exceed any money otherwise made on the units.

Meanwhile, the Neo900 project is effectively stalled having experienced several setbacks, notably the freezing of funds by PayPal for no really good reason, and difficulties in finding and retaining qualified people to do things like board layout. Although there are aspirations to get to completion, perhaps with some modification of the original goals, the path to completion remains obscure and uncertain. It is certainly hard to sustain my initial enthusiasm for the project, even if I do sympathise deeply with the struggles and difficulties of those trying to deliver something that they want to see succeed perhaps more than anyone else.

The future is not necessarily entirely bleak for these projects, though. Experience from the GTA04 effort may have been beneficial to the development of the Pyra handheld computer, whose own genesis has been troubled at times and yet forthrightly communicated in an honourably transparent fashion by its initiator, and the CPU board for that device may end up as the basis for a new product known as the GTA15. Given the common architectural heritage of the GTA04, N900 and Neo900, it would not be completely inconceivable that if some kind of way forward could be found for the Neo900, GTA15 might be some kind of contributing element to that.

What these projects should illustrate, however, is that the foundations of a Free Software mobile device are difficult to prepare, subject to sudden and potentially catastrophic setbacks or outright failure, and they require persistence and plenty of resources, not least of the financial kind but also the dedication of people with the right competence and experience. Sadly, these projects never get the attention, the recognition, or the generosity they deserve.

## Seeing the Bigger Picture

If we care about being able to support all the different elements of our phones with Free Software, instead of crossing out items in the specification list, sacrificing functionality because nobody knows how it works without signing a non-disclosure agreement and then only being allowed to release a binary blob for loading into specific Linux kernel versions, then we need to be there at the start, when the phone gets designed, and play a role in making sure that everything can be supported by Free Software. Spending time and effort on figuring out someone else’s hardware is not time and effort well spent.

Indeed, from the moment a proprietary product gets into the hands of developers, the clock is ticking. Already, given the pace of product development and shipment, the product is heading for obsolescence and its manufacturer will be tooling up to make its successor. Even if downstream developers work quickly and get as much of the hardware supported as they can, there will be only be a certain period before the product becomes difficult to obtain. And then the process of catch-up starts all over again with the next product.

Of course, product variations always used to happen with desktop and laptop computers. One day you’d get a laptop with one chipset and the next day the “same” laptop would contain something else. The only thing that eased the pain involved was broad hardware support for these kinds of devices, and even then there would be chipsets notorious for their lack of support in things like the Linux kernel.

Such pitfalls cultivated demands for products that could run Free Software and be supplied with such software instead of the usual proprietary products bundled as a consequence of Microsoft’s anticompetitive and coercive business practices. It was no longer enough to accept that we might buy a computer with bundled software and “install over it”, that this might emphasise our Free Software credentials. Credible advocates of Free Software have sought to identify vendors offering systems that are either already supplied with Free Software or that come without any preinstalled software at all, in both cases being fully capable of supporting a Free Software distribution.

But we now find ourselves in the absurd situation with mobile devices where remedial measures comparable to “install over it” are almost the only things people can suggest, that there really aren’t any mobile device vendors who can offer a bare, supportable device or one that is, say, already running Replicant and offering access to all of the device’s features. And although refurbished devices are sold that may run Replicant well enough, we lack another essential guarantee that may not have been so important in the past, one that community-originating hardware projects might be able to help with.

In being involved with the design of these devices, we can seek to dictate how long they remain viable. Instead of having a large corporation decide that now is the time for you to buy their next device and that the product you bought and liked is now deliberately unavailable, we can seek to keep making devices as long as they have a role and have people wanting to keep using them.

If something runs a Free Software distribution well enough, and if that device can still be made, it becomes a safe choice and something we can recommend to others. At last, we get some kind of certainty in a world whose stream of continual change is often fad-driven, exploitative and needless.

## The Strategic Vacuum

So it seems obvious to me that if people want Free Software on phones, they need to cultivate the efforts to make that Free Software viable, which means cultivating sustainable hardware design and actually promoting and supporting the projects pursuing it. Otherwise, it is like trying to plant an orchard without paying attention to the soil, cursing that the trees will not grow whilst being oblivious to the fact that the ground is concrete.

And this goes beyond this particular domain. Free Software advocacy is all well and good, but there needs to be practical action that goes beyond pitching in and nudging things towards success. It is wonderful that collective effort and collaboration can take small endeavours and grow them into solutions that are useful for many, but it can be too much to expect everything to just coalesce as if by magic, that people and projects will just discover each other, work together, strengthen each other and multiply the benefits.

There needs to be a strategy, for people to identify real-world problems that are not being addressed by Free Software, to identify the projects that might be applied to those problems, and to propose actual solutions. In the absence of Free Software, proprietary and exploitative solutions are able to stake out their territory, entrench themselves and to thrive.

Here, I struggle to see which Free Software organisations have both the breadth of scope and the depth of focus to make a difference. Developer-driven organisations like Debian and KDE have a lot going on, and they deliver non-trivial software systems, and yet sustaining something like a viable mobile platform (encompassing hardware and software) is seemingly beyond their reach. Neither have a coherent answer to other significant challenges of our time, such as the dominance of proprietary (and destructive) communications platforms.

Meanwhile, advocacy-driven organisations like the FSFE merely give advice to people about how they might help Free Software. The FSF arguably combines this with actual development through the GNU project, and there are those lists of urgent activities, but one cannot help but have the impression that the urgency is reactive and not the product of strategic vision (beyond the goal of the proliferation of Free Software, of course). I would like to think that the Software Freedom Conservancy might combine things more effectively, but it perhaps remains more of an umbrella organisation with a similar emphasis on broad Free Software adoption.

I would like to think that the step of getting involved in Free Software is but the first step towards fairer, kinder, more transparent, more productive and more sustainable societies. Traces of such visions can be seen in the communications of various organisations and yet they largely hold back from suggesting what the next steps might be. And yet, I think, we will in future really need to take those steps in order to protect ourselves, our societies, and the things we care about.

In general, we notice the absence of strategy; in specific cases, we notice it keenly. Which organisations are willing and able to fill this strategic void coherently and decisively, to offer concrete solutions as opposed to suggestions, to have something definite to work towards, and to direct effort and resources into actually realising such goals? Surely, in this age of hoping for the best, those organisations will be the ones that deserve our support the most.

## Open Call for Humanitarian Design Challenges

Do you have a brilliant idea for an innovative product for the beneficiaries of your mission-driven organization or for your colleagues in the field? Or do they use an existing product that needs serious improvement?

Then here is your chance: have a team of Industrial Product Design students create that for you!

### What you get

We set up a collaboration with Rotterdam University of Applied Sciences to work on practical challenges in Humanitarian Development, Aid and Disaster Relief:

• Duration: February 2019 until May 2019
• Value: a team of students works 8 full-time weeks on your idea
• Result: a working prototype of the solution

### What you bring in

• An idea for a physical product (may have a digital component, but not digital-only)
• A clear, practical and concise challenge description (format and examples available)
• 3 (online) meetings between February and March with users, stakeholders and/or challenge owner to provide context information and answer practical questions
• If the challenge owner or a knowledgeable field worker is in the Netherlands: a physical meeting in Rotterdam with the team at the start would be helpful

All designs and documentation of the solution will be freely published online as Open Source, to the benefit of you, users and other stakeholders, future (student) teams and anyone interested.

Previous programs resulted in amongst others solutions for Field Ready: a dust-mask created from plastic waste to use in the air polluted city Kathmandu, an ear/nose syringe, bricks made from PET bottles, a hydroponics system, a 3D-printable sharps bottle box and a blood warmer.

### Interested?

Contact me for more information and receiving the challenge description format and examples.

This description is also available in PDF.

– Diderik

## Join the Fediverse!

Federated Networks are AWESOME! When I first learned about the concept of federation when I started using Jabber/XMPP, I was blown away. I could set up my own private chat server on a Raspberry Pi and still be able to communicate with people from the internet. I did not rely on external service providers and instead could run my service on my own hardware.

About a year ago or so I learned about ActivityPub, another federated protocol, which allows users to share their thoughts, post links, videos and other content. Mastodon is probably the most prominent service that uses ActivityPub to create a Twitter-like microblogging platform.

But there are other examples like PeerTube, a YouTube-like video platform which allows users to upload, view and share videos with each other. Pleroma allows users to create longer posts than Mastodon and Plume can be used to create whole blogs. PixelFed aims to recreate the Instagram experience and Prismo is a federated Reddit alternative.

But the best thing about ActivityPub: All those services federate not only per service, but only across each other. For instance, you can follow PeerTube creators from your Mastodon account!

And now the icing on the cake: You can now also follow this particular blog! It is traveling the fediverse under the handle @vanitasvitae@blog.jabberhead.tk

Matthias Pfefferle wrote a WordPress plugin, that teaches your WordPress blog to talk to other services using the ActivityPub protocol. That makes all my blog posts available in and a part of the fediverse. You can even comment on the posts from within Mastodon for example!

In my opinion, the internet is too heavily depending on centralized services. Having decentralized services that are united in federation is an awesome way to take back control.

## Okular: PDF Signature + Certificate support has landed

As of a few minutes ago, i merged the code from Chinmoy Ranjan Pradhan's GSOC to support showing PDF Signatures and Certificates in Okular.

Signature handling is a big step for us, but it's also very complex, so i expect it to have bugs and things that can be improved so testers more than welcome.

Compiling is a bit "hard" since it requires poppler 0.73 that was released a few days ago.

But thanks to flatpak, there's no need to compile it, you can run the KDE Okular Nightly on your system to try it

flatpak remote-add --if-not-exists kdeapps --from https://distribute.kde.org/kdeapps.flatpakrepo
flatpak install kdeapps org.kde.okular

Note: if you have okular installed from another flatpak repo (for example flathub) this will switch you to the KDE Nightlies, you may want to switch back after testing.

And then you can try the adobe sample pdf

And you should get stuff like this

Just a quick hint: Mike Kuketz released a blog post about how you can use Blokada to block ads and trackers on your android device. In his post, he explains how Blokada uses a private VPN to block DNS requests to known tracker/ad sites and recommends a set of rules to configure the app for best experience.

He also briefly mentions F-Droid and gives some arguments, why you should get your apps from there instead of the Play Store.

The blog post is written in German and is available on kuketz-blog.de.

## sjfonts 2.1 released

More than 11 years after sjfonts 2.0.2 was released today I'm announcing sjfonts 2.1

It contains two enhacements contributed by Yuri Chornoivan
* Delphine font now has the Euro sign
* Steve font now has "basic" Cyrillic characters

If by any chance your distribution is packaging them, update!

https://sourceforge.net/projects/sjfonts/files/sjfonts/sjfonts-2.1/

Yes, it's on sourceforge ;)

## Building c-base @ 35C3 with Flowhub

The 35th Chaos Communication Congress is now over, and it is time to write about how we built the software side of the c-base assembly there.

## c-base at 35C3

The Chaos Communication Congress is a major fixture of the European security and free software scene, with thousands of attendees. As always, the “mother of all hackerspaces” had a big presence there, with a custom booth that we spend nearly two weeks constructing.

This year’s theme was “Refreshing Memories”, and accordingly we brought various elements of the history of the c-base space station to the event. On hardware side we had things like a scale model the c-base antenna, as well as vintage arcade machines and various artifacts from over the years.

With software, we utilized the existing IoT infrastructure at c-base to control lights, sound, and drive videos and other information to a set of information displays. All of course powered by Flowhub.

This was a quite full-stack development effort, involving microcontroller firmware programming, server-side NoFlo and MsgFlo development, and front-end infoscreen web design. We also did quite a bit of devopsing with Travis CI, Docker, and docker-compose.

### Local MsgFlo setup

The first step in bringing c-base’s IoT setup was to prepare a “portable” version of the environment. An MQTT broker, MsgFlo, some components, and a graph with any on-premise c-base hardware or service dependencies removed. As this was for a CCC event, we decided to call it c3-flo (in comparison to the c-flo that we run at c-base).

We already have a quite nice setup where our various systems get built and tested on Travis, and uploaded to Docker Hub’s cbase namespace. Some repositories weren’t yet integrated, and so the first step was to Dockerize them.

To make the local setup simple to manage, we decided to go with a single docker-compose environment that would start all systems needed. This would be easy to run on any x86 machine, and provide us with a quite comprehensive set of features from the IoT parts to NASA’s Open MCT dashboard.

Of course we kept adding to the system throughout 35C3, but in the end the graph looked like the following:

### WiFi setup

To make our setup more portable, we decided to bring a local instance of the “c-base botnet” WiFi used to Congress. This way all of our IoT devices could work at 35C3 with the exact same firmware and networking setup as they do at c-base.

Normally Congress doesn’t recommend running your own access point. But if needed, there are guidelines available on how to do it properly if needed. As it happens, out of this year’s 558 unofficial access points, the c-base one was the only one conforming to the guidelines (commentary around the 25 minute mark).

### Info displays

Like any station, c-base has a set of info screens showing various announcements, timelines, and statistics. These are built with Raspberry Pi 3s running Chrome in Kiosk Mode, with a single-page webapp that connects to our MsgFlo infrastructure over WebSockets with msgflo-browser.

Each screen has a customized rotation of different pages to show, and we can send URLs to announce events like members arriving to c-base or a space launch livestream via MQTT.

For 35C3 we built a new set of pages tailed for the Congress experience:

• Tweaked version of the normal c-base events view showing current and upcoming talks
• Video player to rotate various videos from the history of c-base
• Photo slideshow with a nice set of pictures from c-base
• Countdown screen for some event (c-base crash, teardown of the assembly at the end of Congress)

## Crashing c-base

Highlight of the whole assembly was a re-enactment of the c-base crash from billions of years ago. Triggered by a dropped bottle of space soda, this was an experience incorporating video, lights, and audio that we ran several times every day of the conference.

The c-base crash animation was managed by a NoFlo graph integrated to the our MsgFlo setup with the standard noflo-runtime-msgflo tool. With this we could trigger the “crash” with a MQTT message (sent by a physical button), and run a timed sequence of actions on lights, a sound system, and our info screens.

### Timeline manager

There were some new components that we had to build for this purpose. The most important was a Timeline component that was upstreamed as part of the noflo-tween animation library.

With this you can define a multi-tracked timeline as JSON or YAML, with actions triggered on each track on their appropriate second. With MsgFlo this meant we could send timed commands to different devices and create a coordinated experience.

For example, our animation started by showing a short video on all info screens. When the bottle fell in the video, we triggered the appropriate soundtrack, and switched the lights through various animation modes. After the video ended, we switched to a “countdown to crash” screen, and turned all lights to a red alert mode.

After the crash happened, everything went dark for a few seconds, before the c-base assembly was returned into its normal state.

### Controlling McLighting

All LED strips we used at 35C3 were run using the McLighting firmware. By default it allows switching between different light modes with a simple WebSocket API.

For our requirements, we wanted the capability to send new commands to the lights with minimal latency, and to be able to restore the lights to whatever mode they had before the crash started in the end.

The component is available in noflo-mclighting. The only thing you need is running the NoFlo graph in the same network as the LED strips, and to send the WebSocket addresses of your LED strips to the component. After that you can control them with normal NoFlo packets.

## Finally

The whole setup took a couple of days to get right, especially regarding timings and tweaking the light modes. But, it was great! You can see a video of it below:

And if you’re interested in experimenting this stuff, check out the “portable c-base IoT setup” at https://github.com/c-base/c3-flo.

## 2018 and 2019

2018 is over and 2019 starts. This is a great opportunity to look back, reflect and to try to look into the future. I predict that 2019 will be a very good year for privacy, open source and decentralized cloud software. Maybe even the mainstream breakthrough of federated and decentralized internet services!

Let me explain why:

The mainstream opinion about centralized services started to change in 2018 and I think this trend will continue in 2019. More and more people see the issue with large, centralized data silos that control more and more of our private lives, democratic processes and society as a whole. Some examples from 2018 where bad news hit the press include:

• The never ending list of Facebook scandals: Wired
• Amazon Alexa is listening to private conversations and is leaking the data: Heise and Â BusinessInsider
• Dropbox is leaking private date: TechTarget
• Google Plus is insecure and will shut down: CNBC

This year, Europe introduced the GDPR to regulate the collection of private data. I believe it is a good start and think we ultimately we need rules as described in the User Data Manifesto
I expected that people in the US and Asia wouldn’t take the GDPR seriously and make fun of Europeans tendency to ‘over-regulate’. So I was surprised to see that the GDPR was widely praised as a step into the right direction. People in Asia and US are already asking for similar regulations in their markets, California has already announced its own variant of the GDPR with the California Consumer Privacy Act.

This clearly shows that the world is changing. People realize more and more that extensive centralized data collection is a problem. This is an opportunity for open source and decentralized and federated alternatives to enter the mainstream.

At Nextcloud we have become widely recognized as one of the major alternatives. And this year was big for us, with three big releases introducing new technologies the world needs going forward. Let me name just a few:

• End-to-end Encryption. In 2018 Nextcloud launched support for full end 2 end encrypted file sync and share.
• Nextcloud Talk. Beginning of 2018 we launched Nextcloud Talk as a fully integrated self hosted, open source and decentralized chat and audio/video call solution
• Just a few weeks ago we launched Social with ActivityPub support to integrated with Mastodon and other projects of the Fediverse.
• Simple Signup. In summer we launched the Simple Signup feature to make it possible for new users to sign up at one of the Nextcloud providers directly from the Mobile and Desktop apps.
• We launched our unique Video Verification feature to become the most secure file share platform.
• In summer we announced the initiative to ship Nextcloud preinstalled on millions of NEC routers, something that will take off in 2019, you might have seen the prototype devices on social media.
• This fall we launched the Nextcloud Include program with funding from the Reinhard von KĂśnig Preis for innovation. I’m happy we run this project together with my old friends from KDE.

In 2018 I traveled to more events and countries than ever before. It’s great to see how the Nextcloud community is growing all over the globe. On the company and business side we also have good news. The Nextcloud company is growing nicely in all areas. There will be separate news about this soon.

Of course it’s the mission of Nextcloud to not do everything alone. This is why we launched a lot of integration projects in 2018. For example with Rocket.Chat, Moodle, StorJ, Mastodon and others. I’m really happy to see that other open source and decentralization projects do as well as Nextcloud.

I think 2019 could be the year where open source, federated and self-hosted technology hits mainstream, taking on the proprietary, centralized data silos keeping people’s personal information hostage. Society becoming more critical about data collection will fuel this development.

If you want to make a difference then join Nextcloud or one of the other project that develop open source decentralized and federated solutions. I think 2019 is the year were we can win the internet back!

## Cultural Techniques

CC BY-SA 3.0, via wikimedia/Paulis

These days I looked up the German word “Kulturtechnik” at Wikipedia which translates to “cultural techniques” in English. Surprisingly there is no English Wipipedia article for it, so I have to quote the German one. This section attracted my attention the most:

[Für Kulturtechniken] sind ein oder mehrere Voraussetzungen nötig: das Beherrschen von Lesen, Schreiben und Rechnen, die Fähigkeit zur bildlichen Darstellung, analytische Fähigkeiten, die Anwendung von kulturhistorischem Wissen oder die Vernetzung verschiedener Methoden.

Bei der Entwicklung von Kulturtechniken handelt es sich nicht um Leistungen von Einzelpersonen, sondern um Gruppenleistungen, die in einem soziokulturellen Kontext entstehen. Alle genannten Voraussetzungen benötigen daher immer die soziale Interaktion und gesellschaftliche Teilhabe (Partizipation).

According to this description, cultural techniques are not capabilities of individuals but achievements of a collective, done in a socio-cultural context, it always requires social interaction and participation. This means that reading, writing or math are no culture techniques by itself, but collaborative writing would qualify as such, for example with a wiki or any other collaboration platforms.

## Free Software as a cultural technique

When talking about Free Software I argued already in the past that software is a new cultural technique. My arguments typically was along the line that software is everywhere and shapes our world. Software changed the way we live, learn, work, communicate, participate in society and share our culture. I think that’s still true, but this Wikipedia article added an important aspect to me. With the distinction between the tools and what we achieve collectively with it, I think we can argue that software alone is not a cultural technique, but Free Software is.

By definition Free Software is a licensing model for software. A software license that gives the users the freedom to use, study, share and improve the software makes it Free Software. These days Free Software influence all areas of our live. Cars, airplanes, cash terminals, pay machines, the internet, televisions, smart phones, I could continue the list indefinitely, nothing would be possible without Free Software. The freedom given by the license and the influence it has on all areas of our live changed the way we develop software. A new development model was established and is used for most Free Software these days. The most successful Free Software is developed by open communities with a strong focus on collaboration and participation. This model is embraced by individuals and large organizations. According to the 2017 Linux Kernel Report, 4300 developers from over 500 companies have contributed to the kernel, with a impressive list of large companies. Everyone works at the same code, often in parallel. People discuss the changes proposed by each other and improve it together until it is ready to be released. The community is not limited to the code, in a similar collaborative way the corresponding design, documentation, artwork and translations are created. People exchange ideas in online forums, real time chats and meet at conferences. All this happens in a transparent and socio-cultural context, open for everyone to join.

But not only the way Free Software is build, also the usage of the software fits the definition of cultural techniques. From a user perspective, Free Software fosters collaboration and participation in many ways. It can be shared freely so that it encourage collaboration in its area of use. For example pupils can exchange the software they use to do their homework or to prepare their presentation. This teaches a culture of collaboration and make sure that everyone has the same possibilities to participate. Different departments in organizations can exchange software and give it to as much employees as needed without worrying that the maximum number of users, allowed by the license is already exceeded.

In a world defined by software, access to software decides who has access to our culture, to our communication tools, about our possibilities in education and at work. Free Software makes sure that everyone has the same possibilities to participate in today’s society. It fosters collaboration and participation in contrast to proprietary software which divides people and make sure that everyone is on his own.

All this proves to me that Free Software is the latest cultural technique. As such it requires special attention by policy makers and society. I think it is in all our interest to protect and foster this new cultural technique.