Benki Lazy Chat

August 16, 2025

Matthias #

Zig is a strange point in the design space

The Zig programming language is quite strange. I am unsure whether I should like it.

The language is small. There are very few primitives, including for abstraction. On the other hand, one of them is a somewhat Lisp-like compile-time metaprogramming mechanism called comptime, which is very general.

The standard library is small, too. It is very much the opposite of a batteries-included approach. You have to write a lot of code yourself. On the other hand, the standard library is also designed to be complete enough that you can get by without linking against your platform’s C library.

Zig wants to implement everything itself and does not want to rely on C. On the other hand, importing and using C code is easier and more natural in Zig than most languages, arguably more natural than C++ – C++ does not really help you manage C resources whereas Zig provides defer.

Along similar lines, even though Zig wants to be independent of C, even though its standard library wraps syscalls directly (on kernels with a stable syscall API such as Linux), it takes care of building all the C code that you may need, including when cross-compiling to a different target architecture and OS. In theory you can use Zig as a cross-compilation-capable build system and dependency manager for pure C code bases.

There is no compile-time safety. There is neither lifetime tracking nor particularly rich types. On the other hand, when you run your program in debug mode there is no undefined behavior, only crashes, and memory leaks are detected and printed to the console.

The type system is limited. There is no built-in polymorphism, neither universal (read: dynamic dispatch) nor ad-hoc (read: static dispatch). On the other hand, you have tagged unions that look and behave like algebraic data types, you are encouraged to use comptime functions to generate types from templates, and the standard library uses explicit vtable structs under the hood in some places.

Zig is not object-oriented. There is no encapsulation enforcement, and because there is no universal polymorphism there are no interfaces and no inheritance. Yet structs and unions can have methods, and there is a self type that is useful when you create types at comptime.

It seems to me that Zig improves upon C in every respect in which it differs from it save for the number of platforms that it is available for. But the design space for languages that fulfill this criterion is big. It is not clear to me whether Zig is the best point explored in it so far.

Except for the name. There is no doubt that the name is good.

February 15, 2025

Matthias #

How to use Yomitan on Android

Install and open Microsoft Edge Canary.
On the browser’s about page, which you can find in the settings menu, tap the build number 5 times to enable developer mode.
In the developer mode settings, select extension install by ID and input idelnfbbmikgfiejhgmddlbkfgiifnnn.

The extension ID is found as the last part of the URI of the Yomitan extension page in the Microsoft Edge add-on store.

You can combine Yomitan with ッツ Ebook Reader to read ePub books with dictionary support. This makes extra sense when the Android device is a Boox ebook reader.

January 18, 2025

Matthias #

Apple’s virtualization framework and APFS compression do not like each other

The setting: You are running a virtualized GNU/Linux instance on an Apple computer. You are in an APFS folder shared with the guest by way of the virtiofs support built in to Apple’s virtualization framework.

Compress a file (any compressible file, but it helps if it has at least a few non-zero bytes in it or the effect will not be particularly impressive) on the host:

host$  afsctool -vc -T LZFSE example.txt

Make a copy of the file using GNU cp inside the guest:

guest$  cp example.txt copy.txt

Observe that the copy contains only zeros:

guest$  od copy.txt
0000000 000000 000000 000000 000000 000000 000000 000000 000000
*
0016020 000000 000000 000000 000000
0016027

Digging deeper using strace, observe that in the cp run, lseek(3, 0, SEEK_DATA) fails with ENXIO (which indicates that the file is sparse and consists of a single big hole) when the correct return value would be 0:

openat(AT_FDCWD, "example.txt", O_RDONLY) = 3
...
lseek(3, 0, SEEK_DATA) = -1 ENXIO (No such device or address)

The assumptions that GNU cp makes seem to be in line with the specifications of the system calls involved. I infer that there must be a bug somewhere between Linux’s virtiofs driver, Apple’s virtualization framework, and APFS.

October 29, 2023

Matthias #

Dealing with the population crisis

Here is a set of political positions that, even though they go together naturally, in combination will offend everyone, both left and right:

Most of the developed world aligned with the West, including Western Europe and the United States, suffers from a fertility crisis. This is bad. We need more people if we want to (1) stay at the top of the world order rather than drop to the bottom and be dominated by bigger players and (2) keep our national economies going and continue to produce more prosperity.
Therefore, we need more immigration. This is emphatically not limited to highly skilled workers, but includes low-skill, less productive immigration, too. After all, we need both more production capacity and a bigger market, and each additional person helps with both (unless they are very old or sick, but those tend not to be the people who like to migrate anyway).
At the same time we have to reward people for producing more offspring. This very likely means that significantly more resources need to be diverted from singles and single-child families to families with 2 or more children, so much so that not starting a family early on is taxed substantially. This will be a regression in both personal freedom and gender equality, but lead to a much-needed recovery of fertility.

People as different as Robin Hanson and Matthew Yglesias have written about the topic – not necessarily in the way I summarized it above – but I have yet to see a party platform that incorporates all of it in combination.

September 24, 2023

Matthias #

Always fun dealing with Spring Boot

I just released version 6.1.0 of Quarkus Google Cloud JSON Logging, the somewhat inaccurately named library for logging in Google Cloud JSON format for JBoss Log Manager, a drop-in replacement for the LogManager from java.util.logging.

The original motivation for creating this release was that Quarkus in version 3.4 switched away from its custom JBoss Log Manager Embedded fork of JBoss Log Manager, rendering my library’s DefaultEmbeddedConfigurator class useless. But as it turns out, the Quarkus extension never used it, so nothing was actually broken by the change. Yay.

Instead I noticed that, somewhat unrelatedly, Spring Boot 3, which the library also supports (I mentioned it is somewhat inaccurately named, yes?), broke the way JBoss Logging (not to be confused with JBoss Log Manager) interacts with everything, especially JBoss Log Manager Embedded, leading logs from its logging facade to be swallowed and consequently never printed.

The good news is that migrating away from JBoss Log Manager Embedded fixes the problem. The real JBoss Log Manager even has another nice feature: You can configure it using a logging.properties file, so a custom configurator factory is not, in fact, needed. You can just create your logging.properties file and set it to use my DefaultConsoleHandler as the handler for your log messages. Or if you prefer, you can inject your own LogContextConfigurator using the standard ServiceLoader mechanism and leave the logging.properties file empty. Great!

Except! Spring Boot likes to make things extra fun. If you use the JavaLoggingSystem, which is what you want to do when you use JBoss Log Manager as your log sink (remember, it is a drop-in replacement for the standard LogManager from java.util.logging), Spring Boot first initializes the root logger, then sets its log level to SEVERE, and finally it asks it to load the configuration file that you supplied. But JBoss Log Manager loads its configuration (or your custom LogContextConfigurator if you supplied one) when it is initialized, so loading it another time means it loads it twice. And that puts you in a pickle because:

If you leave the configuration file empty and configure the message handler some other way, the log level stays at SEVERE and you lose all logs.
If you configure a logger in the configuration file and set its level and message handler, the handler is added twice, so you get double the logs.

So in the end I still had to make the change that I had originally set out to do, which was to add a new DefaultConfiguratorFactory class to replace the old DefaultEmbeddedConfigurator and enable everyone to migrate away from JBoss Log Manager Embedded. With my DefaultConfiguratorFactory, the configuration file still gets loaded twice, but the custom configurator takes care of fixing the list of handlers to a single Google-Cloud-JSON-enabled ConsoleHandler.

Check out the readme in case you want to get in on the fun.

February 4, 2023

Matthias #

I find it striking that AI-generated art characteristically draws human hands with too few or deformed fingers, which is similar to how counting your own fingers is a common (and presumably effective) trick to determine whether you are dreaming. What parallels between human cognition and simulator machines does this hint at?

November 16, 2022

Matthias #

NTRU in OpenSSH

OpenSSH has supported post-quantum cryptography for several years now. sntrup761x25519-sha512@openssh.com (hybrid Streamlined NTRU Prime and X25519) was introduced in OpenSSH 8.5. It was made the default key exchange mechanism in OpenSSH 9.0.

The reason it is a hybrid mechanism is that the security of NTRU is not as well-established as that of X25519 and other classical methods.

Note that this for key exchange, not authentication. But considering that post-quantum cryptanalysis is a future concern rather than one situated in the present, that is what is worth focusing on right now. After all, someone who can break your key exchange in the future can record your encrypted exchanges now and decode them later. But they cannot travel back into the past to impersonate you in the present.

August 25, 2022

Matthias #

Rent controls

The problem

Rent controls sound nice at first. High housing costs hit poor people especially hard. Funding the building of cheap public housing, which would alleviate exploding rents, doesn’t work long-term because in order to stay funded it needs to make a profit and so can’t stay cheap for long. So instead you just go and make a law that caps housing rents. Problem solved.

Except it isn’t. High rents are the result of scarcity. By capping rents:

You disincentivize the building of new housing by restricting profits, which further restricts supply in the long run.
You subsidize high earners who live in apartments that are much too large for them, restricting supply even in the short term.

This latter part is a particularly painful realization when (as I do) you identify as relatively pro redistribution. It would be much preferable to subsidize low earners in a more targeted way.

Alternative: direct subsidies

Instead of restricting supply and subsidizing everyone indiscriminately, the solution, then, is to deregulate supply and pay direct subsidies to low earners to enable them to pay the higher rent.

Benefits:

This increases profits for people who build new apartments, which is what you want.
It nudges people living in apartments that are too large for them to move to smaller apartments.
By freeing larger apartments and enabling a more targeted subsidy, it benefits poor families, who can now move to apartments that are big enough for them.

Raising the money needed through Georgism

The obvious question the direct subsidy approach raises is where to get the money from.

The answer is a land value tax: tax the land (not the buildings built on it!) a fair amount to match its profit-making potential.

Here is why it is a good idea:

It incentivizes people to build more apartments onto a given parcel of land.
It strongly disincentivizes single-family home parcels, which are extremely inefficient and simply have no place in an expensive city these days.
It keeps the good incentives (incentive to build more, incentive to move to smaller apartments) in place while moving money from land owners to renters in need of support.

Challenges

The approach outlined here faces a few challenges:

It is not obvious how to calculate the value of a parcel of land.
Land owners, who have a powerful lobby, will not be pleased by the changes.

I consider neither of these a deal breaker, but they will have to be dealt with. The land value estimation problem is real, but it does not need to be solved in full generality, since some inaccuracy can be tolerated as long as the profit making potential is still positive (i.e. it forces the state to err on the side of a too low tax rather than a too high tax). The lobbying point must be dealt with through politics as usual. A first step could be an education campaign, which I am hoping to contribute to with this article.

August 5, 2022

Matthias #

Netty, gRPC, and accounting for direct memory allocation

When the Java Virtual Machine determines whether it may grow the heap or has to force the garbage collector to run in order to free space, it relies on the information it has about the various memory spaces that it is managing. In general, however, this is not all the memory that the application uses. For one, there is the C heap, which native libraries may allocate memory from. Second, the Java Virtual Machine itself allocates some memory outside the Java heap to perform its basic operations. And then there is off-heap memory allocated by Java code, a mechanism that is called direct allocation.

Native Memory Tracking, turned on by the -XX:NativeMemoryTracking=summary command-line flag, enables the Java Virtual Machine to collect statistics on how much native memory of which kind it itself is using and expose them to jcmd. Direct allocations are listed in the “Other” category.

If you use ByteBuffer#allocateDirect to allocate a direct ByteBuffer, the Java Virtual Machine will keep of track of the allocation. If you have set a memory limit with the help of the -XX:MaxRAMPercentage= command-line flag, it will also reduce the maximum heap size accordingly.

Now, for better or worse, there are ways to circumvent ByteBuffer#allocateDirect, and some Java libraries do just that in the name of performance either by calling methods of sun.misc.Unsafe or calling into native libraries. One particularly popular offender is Netty.

There are several different knobs you can turn. For example, you can run your program with -Dio.netty.noUnsafe=true to disable the use of sun.misc.Unsafe; you can try -Dio.netty.allocator.type=unpooled -Dio.netty.noPreferDirect=true to avoid direct allocations as far as possible; or (and this is what I recommend) you can run your program with -Dio.netty.maxDirectMemory=0, which makes direct allocations go through ByteBuffer#allocateDirect without inhibiting other unsafe performance shenanigans.

Now with that out of the way, you would think that Netty behaves. But perhaps you are using gRPC in your code base. If you expected the above system properties to have an effect, you would have been wrong because gRPC ships with its own shaded Netty, which has an effect on the names of the system properties that it reads.

So what you actually have to do is set -Dio.netty.maxDirectMemory=0 -Dio.grpc.netty.shaded.io.netty.maxDirectMemory=0.

Now the Java Virtual Machine knows what is going on and limits allocations and the heap size according to -XX:MaxRAMPercentage= to the best of its ability.

July 24, 2022

Matthias #

Boehm–Berarducci encoding in Java

Last time we looked at how to apply Scott encoding to algebraic data types in Java. This time we are going to look at Boehm–Berarducci encoding, which is related.

Again we take the well-known persistent binary tree data type with labeled leaves:

Bᵨ := ρ + (Bᵨ × Bᵨ)

Or, in Haskell notation:

data BinaryTree t = Leaf t | Fork (BinaryTree t) (BinaryTree t)

Recall the Scott encoding of this data structure:

interface BinaryTreeScott<T> {
  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTreeScott<T>, BinaryTreeScott<T>, R> onFork);
}

From a type-theoretic perspective, there is a problem with this: BinaryTreeScott<T> is defined in terms of itself. Perhaps you want to work in a type system that does not support recursive types.

Encoding

The Boehm–Berarducci encoding comes to the rescue. For our binary tree example it is:

interface BinaryTreeBoehmBerarducci<T> {
  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);
}

Note how the only technical difference is in how the self-type is encoded. In other words, the two encodings coincide on non-recursive types.

The intuitive difference is that while the Scott encoding encodes a data type using its pattern match, Boehm–Berarducci encodes it using its fold.

Implementation

The implementation is straight-forward:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) {
    //return (foldLeaf, foldFork) -> foldLeaf.apply(value);

    return new BinaryTreeBoehmBerarducci<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork) {
        return foldLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) {
    //return (foldLeaf, foldFork) -> foldFork.apply(left.fold(foldLeaf, foldFork), right.fold(foldLeaf, foldFork));

    return new BinaryTreeBoehmBerarducci<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork) {
        return foldFork.apply(left.fold(foldLeaf, foldFork), right.fold(foldLeaf, foldFork));
      }
    };
  }
}

Summing up the leaves is now even easier than it was before, without even requiring explicit recursion:

class BinaryTreeBoehmBerarducciOps {

  static Integer sum(BinaryTreeBoehmBerarducci<Integer> b) {
    return b.fold(
        n -> n,
        (l, r) -> l + r);
  }
}

Pattern matching

But how do you perform a mere pattern match without recurring? This has now become quite tricky to do.

Your first idea might be to convert the Boehm–Berarducci-encoded data type into its Scott encoding:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) { ... }
  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) { ... }

  default BinaryTreeScott<T> toScott() {
    return fold(
        BinaryTreeScott::ofLeaf,
        BinaryTreeScott::ofFork);
  }
}

This works, but now we depend on the recursively defined type of BinaryTreeScott<T> again, which is what we wanted to avoid. To get around this limitation, we define a non-recursive helper type that fulfills the same role as BinaryTreeScott<T>, but based on BinaryTreeBoehmBerarducci<T>:

interface Deconstrutor<T> {
  <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork);
}

Then we fold our BinaryTreeBoehmBerarducci<T> into a Deconstructor<T>, on which we can call visit to perform our pattern match:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) { ... }
  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) { ... }

  default <R> R visit(
      Function<T, R> onLeaf,
      BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, R> onFork) {

    interface Deconstructor<T> {
      <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork);
    }

    return
        this.<Deconstructor<T>>fold(
                v ->
                    new Deconstructor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork1) {
                        return onLeaf1.apply(v);
                      }
                    },

                (left, right) ->
                    new Deconstructor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork1) {
                        return onFork1.apply(
                            left.visit(BinaryTreeBoehmBerarducci::ofLeaf, BinaryTreeBoehmBerarducci::ofFork),
                            right.visit(BinaryTreeBoehmBerarducci::ofLeaf, BinaryTreeBoehmBerarducci::ofFork));
                      }
                    })

            .visit(onLeaf, onFork);
  }
}

BinaryTreeBoehmBerarducci#visit now works as BinaryTreeScott#visit did before.

The only problem is that it is horrendously inefficient as it traverses the whole tree and constructs a complete mirror tree of Deconstructor<T> objects for just a single pattern match.

Optimization: lazy fold

We can remedy the pathological inefficiency by making the fold operation lazy in the recursive argument:

interface BinaryTreeBoehmBerarducciLazy<T> {
  <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork);
}

The complete code is thus:

@FunctionalInterface
public interface BinaryTreeBoehmBerarducciLazy<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork);

  static <T> BinaryTreeBoehmBerarducciLazy<T> ofLeaf(T value) {
    return new BinaryTreeBoehmBerarducciLazy<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork) {
        return foldLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTreeBoehmBerarducciLazy<T> ofFork(BinaryTreeBoehmBerarducciLazy<T> left, BinaryTreeBoehmBerarducciLazy<T> right) {
    return new BinaryTreeBoehmBerarducciLazy<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork) {
        return foldFork.apply(() -> left.fold(foldLeaf, foldFork), () -> right.fold(foldLeaf, foldFork));
      }
    };
  }

  default <R> R visit(
      Function<T, R> onLeaf,
      BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, R> onFork) {

    interface Deconstrutor<T> {
      <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork);
    }

    return
        this.<Deconstrutor<T>>fold(
                value ->
                    new Deconstrutor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork1) {
                        return onLeaf1.apply(value);
                      }
                    },

                (left, right) ->
                    new Deconstrutor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork1) {
                        return onFork1.apply(
                            left.get().visit(BinaryTreeBoehmBerarducciLazy::ofLeaf, BinaryTreeBoehmBerarducciLazy::ofFork),
                            right.get().visit(BinaryTreeBoehmBerarducciLazy::ofLeaf, BinaryTreeBoehmBerarducciLazy::ofFork));
                      }
                    })

            .visit(onLeaf, onFork);
  }
}

Matthias #

Scott encoding in Java

Assume you have your typical persistent binary tree data type with labeled leaves:

Bᵨ := ρ + (Bᵨ × Bᵨ)

Or, in Haskell notation:

data BinaryTree t = Leaf t | Fork (BinaryTree t) (BinaryTree t)

We can represent it in modern Java in a straight-forward way:

sealed interface BinaryTree<T> {
  record Leaf<T>(T value) implements BinaryTree<T> {}
  record Fork<T>(BinaryTree<T> left, BinaryTree<T> right) implements BinaryTree<T> {}
}

How do you pattern-match on it?

You could wait for pattern matching for switch to exit preview. Then you will be able to write a function that sums up all the leaves in a tree of integers in a straight-forward way:

class BinaryTreeOps {

  static Integer sum(BinaryTree<Integer> b) {
    return switch(b) {
      BinaryTree.Leaf leaf -> leaf.value;
      BinaryTree.Fork fork -> sum(fork.left) + sum(fork.right);
    };
  }
}

If you do not want to wait, here is an alternative. Add a visit method that takes a deconstructor function for each case and implement it by calling the corresponding deconstructor in each case class:

sealed interface BinaryTree<T> {

    <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

    record Leaf<T>(T value) implements BinaryTree<T> {

      @Override public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onLeaf.apply(value);
      }        
    }

    record Fork<T>(BinaryTree<T> left, BinaryTree<T> right) implements BinaryTree<T> {

      @Override public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onFork.apply(left, right);
      }        
    }
}

Now you can sum up a tree of integers by folding it using the visit method:

class BinaryTreeOps {

  static Integer sum(BinaryTree<Integer> b) {
    return b.visit(
        n -> n, 
        (l, r) -> sum(l) + sum(r));
  }
}

Once you realize that pattern matching is a universal operation and therefore all you ever need out of a data structure (if we ignore performance concerns, that is), it becomes evident that you can get rid of the record middlemen and take visit as the data structure itself:

@FunctionalInterface
interface BinaryTree<T> {

  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

  static <T> BinaryTree<T> ofLeaf(T value) {
    return (onLeaf, onFork) -> onLeaf.apply(value);   // does not compile; see below
  }

  static <T> BinaryTree<T> ofFork(BinaryTree<T> left, BinaryTree<T> right) {
    return (onLeaf, onFork) -> onFork.apply(left, right);   // does not compile; see below
  }
}

No records involved. In fact, notice how there are no data structures in a classical sense whatsoever. Instead, this implementation uses pure closures as data containers.

This would be a valid representation of a binary tree; alas, Java does not permit lambda expressions to implement interface methods with type parameters, so you have to forgo the syntactic sugar and write it out in the old-fashioned way:

interface BinaryTree<T> {

  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

  static <T> BinaryTree<T> ofLeaf(T value) {
    return new BinaryTree<>() {
      @Override
      public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTree<T> ofFork(BinaryTree<T> left, BinaryTree<T> right) {
    return new BinaryTree<>() {
      @Override
      public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onFork.apply(left, right);
      }
    };
  }
}

More verbose, but the same thing, and this time it compiles and works just fine.

This encoding of an algebraic data structure is called the Scott encoding of the data structure.

Clearly it is not very Enterprise-ready, but it does demonstrate how you can summon data structures out of nothing in a language that supports lexical closures.

June 29, 2022

Matthias #

Simulating bad drive blocks with Device Mapper

Say you have a 0.5 MiB (= 1,024 sectors of 512 bytes each) drive at /dev/loop1 and would like to boot it with QEMU while simulating a broken sector at position 256.

You can use dm-error for this.

Write the following into a file and call it broken-drive.dm:

0 256 linear /dev/loop0 0
256 1 error
257 767 linear /dev/loop0 257

Alternatively, you can make use of dm-flakey to simulate a sector that is only sometimes broken, or that does something even worse such as drop any writes made to it. For example:

0 256 linear /dev/loop0 0
256 1 flakey /dev/loop0 256 5 5
257 767 linear /dev/loop0 257

Refer to the documentation of dm-flakey on how exactly it works and what the parameters are that it expects.

Create a virtual device at /dev/mapper/broken-drive using dmsetup create:

dmsetup create broken-drive <broken-drive.dm

You can now use it with QEMU just like any other drive or drive image.

June 19, 2022

Matthias #

Ergonomic mechanical split keyboards

All of these are split down the middle and column-staggered.

June 4, 2022

Matthias # (3)

Comments are now supported

This web site now supports comments.

Comments are simple and anonymous by default. They support a restricted number of HTML tags and their Markdown equivalents:

blocks: p, blockquote, pre
inline style: em, strong, code, sub, sup, s, ins, del
lists: ul, ol, li, dl, dt, dd
accessibility hints: abbr, acronym

To comment, click the permalink hash mark (#) at the top of the post in question. Be sure to enable JavaScript in your web browser.

May 24, 2022

Matthias #

YAML style recommendations

While I’m not a particularly big fan of YAML overall, it does have some clear benefits over both JSON and XML for human-editable configuration files. And while there are some pretty compelling alternatives, YAML is currently ubiquitous, so it makes sense to make the best of it.

Here are my favorite simple style rules. I apply them aggressively whenever I see some YAML that isn’t bolted to the wall and fixed with superglue.

Rule 1. Indent enumerations.

Rule: Indent enumerated items relative to the key that defines them.

Reason: Adhering to the Rectangle Rule makes it easier to find where blocks begin and end.

Bad:

list1:
- item1
- item2
list2:
- item3
- item4

Good:

list1:
  - item1
  - item2
list2:
  - item3
  - item4

This is by far my favorite rule. Even if you do nothing else, it improves the readability of any YAML file by an order of magnitude.

Rule 2. Separate nested blocks by white space.

Rule: Use empty lines to separate blocks of nested content.

Hint: A good rule of thumb is that any block that has children should be surrounded by white space on both sides.

Reason: Empty lines make it easier to find where blocks begin and end.

Bad:

metadata:
  name: mulkcms2
  namespace: mulk
  labels:
    name: mulkcms2-web
    app: mulkcms2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mulkcms2
      group: mulk
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%

Good:

metadata:
  name: mulkcms2
  namespace: mulk

  labels:
    name: mulkcms2-web
    app: mulkcms2

spec:
  replicas: 1

  selector:
    matchLabels:
      app: mulkcms2
      group: mulk

  strategy:
    type: RollingUpdate

    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%

This is a rule that I apply not just to YAML, but to any type of code whatsoever. It not just helps with readability, but also with editability (by enabling me to navigate by block).

May 2, 2022

Matthias #

Well-maintained (or not) OpenJDK Docker images

Here is a list of major OpenJDK vendors and the container images they offer.

Vendor	Image name	Tag	Release cycle	Base OS	Remarks
Azul	`docker.io/azul/zulu-openjdk`	`17`	LTS	Ubuntu
Azul	`docker.io/azul/zulu-openjdk-alpine`	`17`	LTS	Alpine Linux
Azul	`docker.io/azul/zulu-openjdk-centos`	`17`	LTS	CentOS
Azul	`docker.io/azul/zulu-openjdk-debian`	`17`	LTS	Debian
BellSoft	`docker.io/bellsoft/liberica-openjdk-alpine`	`17`	LTS	Alpine (glibc)
BellSoft	`docker.io/bellsoft/liberica-openjdk-alpine`	`latest`	non-LTS	Alpine (glibc)
BellSoft	`docker.io/bellsoft/liberica-openjdk-alpine-musl`	`17`	LTS	Alpine (musl)
BellSoft	`docker.io/bellsoft/liberica-openjdk-alpine-musl`	`latest`	non-LTS	Alpine (musl)
BellSoft	`docker.io/bellsoft/liberica-openjdk-centos`	`17`	LTS	CentOS
BellSoft	`docker.io/bellsoft/liberica-openjdk-centos`	`latest`	non-LTS	CentOS
BellSoft	`docker.io/bellsoft/liberica-openjdk-debian`	`17`	LTS	Debian
BellSoft	`docker.io/bellsoft/liberica-openjdk-debian`	`latest`	non-LTS	Debian
Eclipse	`docker.io/library/eclipse-temurin`	`latest`	non-LTS	Ubuntu	recommended non-LTS¹
Eclipse	`docker.io/library/eclipse-temurin`	`17`	LTS	Ubuntu	recommended LTS¹
Eclipse	`docker.io/library/eclipse-temurin`	`17-alpine`	LTS	Alpine
Google	`gcr.io/distroless/java17-debian11`	`latest`	LTS	Debian
Microsoft	`mcr.microsoft.com/openjdk/jdk`	`17-ubuntu`	LTS	Ubuntu
Microsoft	`mcr.microsoft.com/openjdk/jdk`	`17-mariner`	LTS	CentOS (derivative)
Microsoft	`mcr.microsoft.com/openjdk/jdk`	`17-cbld`	LTS	Debian (derivative)
Oracle	`container-registry.oracle.com/java/openjdk`	`latest`	non-LTS	Oracle Linux	recommended non-LTS²
Red Hat	`registry.access.redhat.com/ubi8/openjdk-17`	`latest`	LTS	RHEL (UBI)⁴	recommended LTS³
Red Hat	`registry.access.redhat.com/ubi8/openjdk-17-runtime`	`latest`	LTS	RHEL (UBI)⁴

General remarks:

As is apparent from the list, most vendors do not offer a rolling non-LTS image. Be careful when using a non-LTS image pinned to a specific version as its time under support will be quite limited. Rolling non-LTS images that always update to the latest OpenJDK version are fine (and may in fact be more secure and reliable than any LTS image considering that OpenJDK Updates primarily consists of backports from later versions).

Generally speaking, Docker images, particularly OpenJDK images, tend to drift from the latest update state of the base OS underlying them. It is probably a good idea to build your own runtime image (perhaps based on something like UBI Micro (manual)) and keep it up to date through a nightly CI job.

I cannot recommend any Alpine-based images at present because there are too many dependencies on glibc specifics (see also) in the ecosystem and using glibc on Alpine is not a supported configuration.

Footnotes:

Being a widely deployed image with lots of attention given to it, the Temurin image is probably the one you want if you prefer Ubuntu over RHEL.
Oracle is the main sponsor of OpenJDK. New OpenJDK releases tend to find their way into their image promptly. Oracle Linux is also a generally well-maintained and secure base; do note, however, that the OpenJDK image is typically only updated when a new OpenJDK is released, so you have to install system package updates yourself.
Red Hat is the second largest contributor to OpenJDK (after Oracle) and one of the sponsors of the OpenJDK 17 Updates project and is typically quick to release security patches. UBI8 is also a well-maintained and secure image base.
UBI is a trimmed-down version of RHEL that Red Hat distribute free of charge as part of their container image offerings.

April 24, 2022

Matthias #

How do I create smaller initramfs images on Ubuntu?

If you are running Ubuntu, your initramfs images may be quite large. On my system, for instance, each initramfs took up about 100 MiB of space. Because I did not pay enough attention when setting up the computer I ended up with a very small boot partition, which prompted me to look for a way to make the initramfs images generated by update-initramfs smaller.

Caution: Any of the below may make your system unbootable. Please only copy the steps if you understand what they do.

Step 1. Fewer kernel modules.

Create a file called, say, /etc/initramfs-tools/conf.d/zzz-custom (the exact name of the file does not matter much) and fill it with the following:

MODULES=dep

This causes mkinitramfs to guess the set of kernel modules required to boot your system based on what is currently loaded and what hardware is present instead of indiscriminately including whatever could be useful to make a computer boot.

This saved me about 50 MiB, reducing the size of the initramfs from 100 MiB to 50 MiB.

Step 2. No GPU.

Assuming you do not need to interact with the initramfs (to debug boot problems or to type in a disk encryption password, say), you can disable the scripts that deal with setting up a graphics frame buffer. Doing so gets rid of GPU firmware, which at least for the amdgpu driver is a pretty sizable amount of data.

Adding the following to /etc/initramfs-tools/conf.d/zzz-custom may or may not be good enough:

FRAMEBUFFER=n

In my case it was not good enough. Since I boot from ZFS, the /usr/share/initramfs-tools/conf-hooks.d/zfs hook was active, forcing FRAMEBUFFER to y regardless of what /etc/initramfs-tools/conf.d says.

But of course you can add your own configuration hook to /usr/share/initramfs-tools/conf-hooks.d. You just have to ensure it runs after the zfs one by giving it a lexicographically higher name. So create a file called /usr/share/initramfs-tools/conf-hooks.d/zzz-custom and fill it with the same content as above:

FRAMEBUFFER=n

As long as you do not need a disk encryption passphrase prompt in the initramfs, this should not break anything. If you do, it is probably a bad idea.

This saved me another 30 MiB, reducing the size of the initramfs from 50 MiB to 20 MiB.

April 9, 2022

Matthias #

How do I access binary data using Vert.x-redis or Quarkus Redis Client?

Vert.x-redis provides the RedisAPI class to access most Redis functionality. The methods provided by RedisAPI only take Strings as arguments, which it encodes in UTF-8. With Quarkus you get to choose between RedisClient and ReactiveRedisClient, both of which mirror RedisAPI, again only containing String-based methods.

So how do you store binary strings?

With Vert.x-redis, use the send method on the Redis class, which is the lower-level interface underlying RedisAPI. Similarly, with Quarkus, instead of injecting a RedisClient or ReactiveRedisClient, inject the lower-level MutinyRedis:

@Inject
MutinyRedis mutinyRedis;

Uni<Void> set(String key, byte[] data) {
    return mutinyRedis
        .send(
            Request.cmd(Command.SET)
                .arg(key)
                .arg(data))
        .replaceWithVoid();
}

Uni<byte[]> get(String key) {
    return mutinyRedis
        .send(
            Request.cmd(Command.GET)
                .arg(key))
        .ifNotNull()
        .transform(Response::toBytes);
}

October 16, 2021

Matthias #

I tried out GNOME 40 today.

I had already read about it online, with quite a few articles suggesting that it was going into the wrong direction UI-wise. One common complaint was that it seemed more touch-oriented and less efficient to use with a mouse.

After trying it out, I am pleasantly surprised. I find I am quite happy with the gesture-based input paradigm that GNOME 40 is built around. In particular, navigating between the activity overview and the main desktop has become significantly quicker for me.

Perhaps my fondness of the MacBook’s way of treating its trackpad as a first-class input device has rubbed off on me and I am now attuned to it. What is surprising, however, is that notwithstanding the substantial amount of time I have spent in macOS, I seem to prefer GNOME’s implemention over the Mac’s. It feels, to me, even more efficient and to the point.

September 19, 2021

Matthias #

Even with popular support for the CSU dwindling, it still seems unlikely for the left to capture a significant number of direct mandates (first-vote seats assigned by first-past-the-post voting) in Bavaria due to vote splitting between the three left-leaning parties (the Left Party, the Social Democrats, and the Greens).

If I were responsible for any of the three parties’ Bavarian election strategies, I would propose to the other two to hold a joint primary election in each voting district. This would likely improve the odds for both the Social Democrats and the Greens while not really changing them for the Left Party (while still giving them the benefit of hurting the other end of the political spectrum plus the opportunity to build alliances within their own).

April 6, 2021

Matthias #

The big issue that I have with how the German government does COVID politics is that they can’t seem to handle the trolley problem.

The default choice appears to be inaction. A decision to act is made only when there is (seemingly) irrefutable scientific consensus that it causes no harm. That is arguably the correct approach for a doctor to take on a patient who cannot make an informed decision for themselves. But a pandemic is more akin to war than to a doctor’s visit. There will be casualties, some of which innocent bystanders—the only question is how many. In that kind of situation, inaction is not the safe choice. There is no safe choice.

January 2, 2021

Matthias #

I just finished migrating my Mailcow installation from a native Kubernetes port that I had made by hand and that was becoming impossible to update to a more streamlined (albeit wasteful with resources) deployment where I wrap docker-compose and a dedicated Docker instance in a Kata container that I run as a Kubernetes pod.

The container is built as a Nix expression and is available in my public Kubeia repository, which contains part of my Kubernetes deployment configuration and image build scripts. A README is available too, in case you would like to try running it yourself.

Do note that the Kubernetes deployment file is just provided as an example. It contains some pretty specific references to my particular deployment – view it as something like a template that you will have to copy and fill with your own data. Another idiosyncracy is that I really really dislike running multiple database servers on a single piece of hardware and so I kluged something in that makes Mailcow use my already provisioned MariaDB instance rather than its own. In other words, your mileage may very much vary.

December 6, 2020

Matthias #

On unsafe Rust

A common misunderstanding is that unsafe Rust is more liberal than safe Rust. It is not. The same invariants apply, but instead of the compiler, it’s your job to uphold them.

Unsafe Rust is what you write when in any other language you would have written a C extension. It is a very rare thing to do.

November 20, 2020

Matthias #

Ich wurde kürzlich darauf hingewiesen, daß Doppelnennungen wie in „Studentinnen und Studenten“ nicht als inklusiv gelten, da sie nur Frauen und Männer einschließen, nicht aber Menschen, die sich als weder noch begreifen. Nur mit Gendersternchen sei es inklusiv: „Student*innen“.

Ich schlage eine Alternative vor. Wie wäre es, wenn wir wieder dazu übergingen, mehr angelsächsische Kultur zu übernehmen, und einfach „Studenten“ sagten? Revolutionär, ich weiß.

Matthias #

Remember: The closer we get to a vaccine, the easier it is to justify a stricter lockdown.

If the logic behind that statement doesn’t seem obvious to you, think about the extreme cases: If we were in a pandemic with no chance of ever getting rid of it or finding any sort of treatment, a lockdown would make little sense. Since everyone would contract the virus eventually, very few people would be saved by the measures, but more people would suffer or even die due to the economic consequences of a lockdown whose duration would have to be indefinite. If, on the other hand, we were just two weeks away from eradicating the pandemic at the snip of a finger, then it would clearly be the correct thing to do to impose a strict lockdown for those two weeks in order to maximize the number of lives saved, since each person who manages to avoid the virus for just another two weeks would be saved from it for good.