Benki → Lazy Chat

next page ⇢
Matthias #

Apple’s virtualization framework and APFS compression do not like each other

The setting: You are running a virtualized GNU/Linux instance on an Apple computer. You are in an APFS folder shared with the guest by way of the virtiofs support built in to Apple’s virtualization framework.

Compress a file (any compressible file, but it helps if it has at least a few non-zero bytes in it or the effect will not be particularly impressive) on the host:

host$  afsctool -vc -T LZFSE example.txt

Make a copy of the file using GNU cp inside the guest:

guest$  cp example.txt copy.txt

Observe that the copy contains only zeros:

guest$  od copy.txt
0000000 000000 000000 000000 000000 000000 000000 000000 000000
*
0016020 000000 000000 000000 000000
0016027

Digging deeper using strace, observe that in the cp run, lseek(3, 0, SEEK_DATA) fails with ENXIO (which indicates that the file is sparse and consists of a single big hole) when the correct return value would be 0:

openat(AT_FDCWD, "example.txt", O_RDONLY) = 3
...
lseek(3, 0, SEEK_DATA) = -1 ENXIO (No such device or address)

The assumptions that GNU cp makes seem to be in line with the specifications of the system calls involved. I infer that there must be a bug somewhere between Linux’s virtiofs driver, Apple’s virtualization framework, and APFS.

Matthias #

Dealing with the population crisis

Here is a set of political positions that, even though they go together naturally, in combination will offend everyone, both left and right:

  1. Most of the developed world aligned with the West, including Western Europe and the United States, suffers from a fertility crisis. This is bad. We need more people if we want to (1) stay at the top of the world order rather than drop to the bottom and be dominated by bigger players and (2) keep our national economies going and continue to produce more prosperity.
  2. Therefore, we need more immigration. This is emphatically not limited to highly skilled workers, but includes low-skill, less productive immigration, too. After all, we need both more production capacity and a bigger market, and each additional person helps with both (unless they are very old or sick, but those tend not to be the people who like to migrate anyway).
  3. At the same time we have to reward people for producing more offspring. This very likely means that significantly more resources need to be diverted from singles and single-child families to families with 2 or more children, so much so that not starting a family early on is taxed substantially. This will be a regression in both personal freedom and gender equality, but lead to a much-needed recovery of fertility.

People as different as Robin Hanson and Matthew Yglesias have written about the topic – not necessarily in the way I summarized it above – but I have yet to see a party platform that incorporates all of it in combination.

Matthias #

Always fun dealing with Spring Boot

I just released version 6.1.0 of Quarkus Google Cloud JSON Logging, the somewhat inaccurately named library for logging in Google Cloud JSON format for JBoss Log Manager, a drop-in replacement for the LogManager from java.util.logging.

The original motivation for creating this release was that Quarkus in version 3.4 switched away from its custom JBoss Log Manager Embedded fork of JBoss Log Manager, rendering my library’s DefaultEmbeddedConfigurator class useless. But as it turns out, the Quarkus extension never used it, so nothing was actually broken by the change. Yay.

Instead I noticed that, somewhat unrelatedly, Spring Boot 3, which the library also supports (I mentioned it is somewhat inaccurately named, yes?), broke the way JBoss Logging (not to be confused with JBoss Log Manager) interacts with everything, especially JBoss Log Manager Embedded, leading logs from its logging facade to be swallowed and consequently never printed.

The good news is that migrating away from JBoss Log Manager Embedded fixes the problem. The real JBoss Log Manager even has another nice feature: You can configure it using a logging.properties file, so a custom configurator factory is not, in fact, needed. You can just create your logging.properties file and set it to use my DefaultConsoleHandler as the handler for your log messages. Or if you prefer, you can inject your own LogContextConfigurator using the standard ServiceLoader mechanism and leave the logging.properties file empty. Great!

Except! Spring Boot likes to make things extra fun. If you use the JavaLoggingSystem, which is what you want to do when you use JBoss Log Manager as your log sink (remember, it is a drop-in replacement for the standard LogManager from java.util.logging), Spring Boot first initializes the root logger, then sets its log level to SEVERE, and finally it asks it to load the configuration file that you supplied. But JBoss Log Manager loads its configuration (or your custom LogContextConfigurator if you supplied one) when it is initialized, so loading it another time means it loads it twice. And that puts you in a pickle because:

  1. If you leave the configuration file empty and configure the message handler some other way, the log level stays at SEVERE and you lose all logs.
  2. If you configure a logger in the configuration file and set its level and message handler, the handler is added twice, so you get double the logs.

So in the end I still had to make the change that I had originally set out to do, which was to add a new DefaultConfiguratorFactory class to replace the old DefaultEmbeddedConfigurator and enable everyone to migrate away from JBoss Log Manager Embedded. With my DefaultConfiguratorFactory, the configuration file still gets loaded twice, but the custom configurator takes care of fixing the list of handlers to a single Google-Cloud-JSON-enabled ConsoleHandler.

Check out the readme in case you want to get in on the fun.

Matthias #

I find it striking that AI-generated art characteristically draws human hands with too few or deformed fingers, which is similar to how counting your own fingers is a common (and presumably effective) trick to determine whether you are dreaming. What parallels between human cognition and simulator machines does this hint at?

Matthias #

NTRU in OpenSSH

OpenSSH has supported post-quantum cryptography for several years now. sntrup761x25519-sha512@openssh.com (hybrid Streamlined NTRU Prime and X25519) was introduced in OpenSSH 8.5. It was made the default key exchange mechanism in OpenSSH 9.0.

The reason it is a hybrid mechanism is that the security of NTRU is not as well-established as that of X25519 and other classical methods.

Note that this for key exchange, not authentication. But considering that post-quantum cryptanalysis is a future concern rather than one situated in the present, that is what is worth focusing on right now. After all, someone who can break your key exchange in the future can record your encrypted exchanges now and decode them later. But they cannot travel back into the past to impersonate you in the present.

Matthias #

Rent controls

The problem

Rent controls sound nice at first. High housing costs hit poor people especially hard. Funding the building of cheap public housing, which would alleviate exploding rents, doesn’t work long-term because in order to stay funded it needs to make a profit and so can’t stay cheap for long. So instead you just go and make a law that caps housing rents. Problem solved.

Except it isn’t. High rents are the result of scarcity. By capping rents:

  • You disincentivize the building of new housing by restricting profits, which further restricts supply in the long run.
  • You subsidize high earners who live in apartments that are much too large for them, restricting supply even in the short term.

This latter part is a particularly painful realization when (as I do) you identify as relatively pro redistribution. It would be much preferable to subsidize low earners in a more targeted way.

Alternative: direct subsidies

Instead of restricting supply and subsidizing everyone indiscriminately, the solution, then, is to deregulate supply and pay direct subsidies to low earners to enable them to pay the higher rent.

Benefits:

  • This increases profits for people who build new apartments, which is what you want.
  • It nudges people living in apartments that are too large for them to move to smaller apartments.
  • By freeing larger apartments and enabling a more targeted subsidy, it benefits poor families, who can now move to apartments that are big enough for them.

Raising the money needed through Georgism

The obvious question the direct subsidy approach raises is where to get the money from.

The answer is a land value tax: tax the land (not the buildings built on it!) a fair amount to match its profit-making potential.

Here is why it is a good idea:

  • It incentivizes people to build more apartments onto a given parcel of land.
  • It strongly disincentivizes single-family home parcels, which are extremely inefficient and simply have no place in an expensive city these days.
  • It keeps the good incentives (incentive to build more, incentive to move to smaller apartments) in place while moving money from land owners to renters in need of support.

Challenges

The approach outlined here faces a few challenges:

  • It is not obvious how to calculate the value of a parcel of land.
  • Land owners, who have a powerful lobby, will not be pleased by the changes.

I consider neither of these a deal breaker, but they will have to be dealt with. The land value estimation problem is real, but it does not need to be solved in full generality, since some inaccuracy can be tolerated as long as the profit making potential is still positive (i.e. it forces the state to err on the side of a too low tax rather than a too high tax). The lobbying point must be dealt with through politics as usual. A first step could be an education campaign, which I am hoping to contribute to with this article.

Matthias #

Netty, gRPC, and accounting for direct memory allocation

When the Java Virtual Machine determines whether it may grow the heap or has to force the garbage collector to run in order to free space, it relies on the information it has about the various memory spaces that it is managing. In general, however, this is not all the memory that the application uses. For one, there is the C heap, which native libraries may allocate memory from. Second, the Java Virtual Machine itself allocates some memory outside the Java heap to perform its basic operations. And then there is off-heap memory allocated by Java code, a mechanism that is called direct allocation.

Native Memory Tracking, turned on by the -XX:NativeMemoryTracking=summary command-line flag, enables the Java Virtual Machine to collect statistics on how much native memory of which kind it itself is using and expose them to jcmd. Direct allocations are listed in the “Other” category.

If you use ByteBuffer#allocateDirect to allocate a direct ByteBuffer, the Java Virtual Machine will keep of track of the allocation. If you have set a memory limit with the help of the -XX:MaxRAMPercentage= command-line flag, it will also reduce the maximum heap size accordingly.

Now, for better or worse, there are ways to circumvent ByteBuffer#allocateDirect, and some Java libraries do just that in the name of performance either by calling methods of sun.misc.Unsafe or calling into native libraries. One particularly popular offender is Netty.

There are several different knobs you can turn. For example, you can run your program with -Dio.netty.noUnsafe=true to disable the use of sun.misc.Unsafe; you can try -Dio.netty.allocator.type=unpooled -Dio.netty.noPreferDirect=true to avoid direct allocations as far as possible; or (and this is what I recommend) you can run your program with -Dio.netty.maxDirectMemory=0, which makes direct allocations go through ByteBuffer#allocateDirect without inhibiting other unsafe performance shenanigans.

Now with that out of the way, you would think that Netty behaves. But perhaps you are using gRPC in your code base. If you expected the above system properties to have an effect, you would have been wrong because gRPC ships with its own shaded Netty, which has an effect on the names of the system properties that it reads.

So what you actually have to do is set -Dio.netty.maxDirectMemory=0 -Dio.grpc.netty.shaded.io.netty.maxDirectMemory=0.

Now the Java Virtual Machine knows what is going on and limits allocations and the heap size according to -XX:MaxRAMPercentage= to the best of its ability.

Matthias #

Boehm–Berarducci encoding in Java

Last time we looked at how to apply Scott encoding to algebraic data types in Java. This time we are going to look at Boehm–Berarducci encoding, which is related.

Again we take the well-known persistent binary tree data type with labeled leaves:

Bᵨ := ρ + (Bᵨ × Bᵨ)

Or, in Haskell notation:

data BinaryTree t = Leaf t | Fork (BinaryTree t) (BinaryTree t)

Recall the Scott encoding of this data structure:

interface BinaryTreeScott<T> {
  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTreeScott<T>, BinaryTreeScott<T>, R> onFork);
}

From a type-theoretic perspective, there is a problem with this: BinaryTreeScott<T> is defined in terms of itself. Perhaps you want to work in a type system that does not support recursive types.

Encoding

The Boehm–Berarducci encoding comes to the rescue. For our binary tree example it is:

interface BinaryTreeBoehmBerarducci<T> {
  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);
}

Note how the only technical difference is in how the self-type is encoded. In other words, the two encodings coincide on non-recursive types.

The intuitive difference is that while the Scott encoding encodes a data type using its pattern match, Boehm–Berarducci encodes it using its fold.

Implementation

The implementation is straight-forward:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) {
    //return (foldLeaf, foldFork) -> foldLeaf.apply(value);

    return new BinaryTreeBoehmBerarducci<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork) {
        return foldLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) {
    //return (foldLeaf, foldFork) -> foldFork.apply(left.fold(foldLeaf, foldFork), right.fold(foldLeaf, foldFork));

    return new BinaryTreeBoehmBerarducci<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork) {
        return foldFork.apply(left.fold(foldLeaf, foldFork), right.fold(foldLeaf, foldFork));
      }
    };
  }
}

Summing up the leaves is now even easier than it was before, without even requiring explicit recursion:

class BinaryTreeBoehmBerarducciOps {

  static Integer sum(BinaryTreeBoehmBerarducci<Integer> b) {
    return b.fold(
        n -> n,
        (l, r) -> l + r);
  }
}

Pattern matching

But how do you perform a mere pattern match without recurring? This has now become quite tricky to do.

Your first idea might be to convert the Boehm–Berarducci-encoded data type into its Scott encoding:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) { ... }
  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) { ... }

  default BinaryTreeScott<T> toScott() {
    return fold(
        BinaryTreeScott::ofLeaf,
        BinaryTreeScott::ofFork);
  }
}

This works, but now we depend on the recursively defined type of BinaryTreeScott<T> again, which is what we wanted to avoid. To get around this limitation, we define a non-recursive helper type that fulfills the same role as BinaryTreeScott<T>, but based on BinaryTreeBoehmBerarducci<T>:

interface Deconstrutor<T> {
  <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork);
}

Then we fold our BinaryTreeBoehmBerarducci<T> into a Deconstructor<T>, on which we can call visit to perform our pattern match:

interface BinaryTreeBoehmBerarducci<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);

  static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) { ... }
  static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) { ... }

  default <R> R visit(
      Function<T, R> onLeaf,
      BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, R> onFork) {

    interface Deconstructor<T> {
      <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork);
    }

    return
        this.<Deconstructor<T>>fold(
                v ->
                    new Deconstructor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork1) {
                        return onLeaf1.apply(v);
                      }
                    },

                (left, right) ->
                    new Deconstructor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork1) {
                        return onFork1.apply(
                            left.visit(BinaryTreeBoehmBerarducci::ofLeaf, BinaryTreeBoehmBerarducci::ofFork),
                            right.visit(BinaryTreeBoehmBerarducci::ofLeaf, BinaryTreeBoehmBerarducci::ofFork));
                      }
                    })

            .visit(onLeaf, onFork);
  }
}

BinaryTreeBoehmBerarducci#visit now works as BinaryTreeScott#visit did before.

The only problem is that it is horrendously inefficient as it traverses the whole tree and constructs a complete mirror tree of Deconstructor<T> objects for just a single pattern match.

Optimization: lazy fold

We can remedy the pathological inefficiency by making the fold operation lazy in the recursive argument:

interface BinaryTreeBoehmBerarducciLazy<T> {
  <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork);
}

The complete code is thus:

@FunctionalInterface
public interface BinaryTreeBoehmBerarducciLazy<T> {

  <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork);

  static <T> BinaryTreeBoehmBerarducciLazy<T> ofLeaf(T value) {
    return new BinaryTreeBoehmBerarducciLazy<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork) {
        return foldLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTreeBoehmBerarducciLazy<T> ofFork(BinaryTreeBoehmBerarducciLazy<T> left, BinaryTreeBoehmBerarducciLazy<T> right) {
    return new BinaryTreeBoehmBerarducciLazy<>() {
      @Override
      public <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork) {
        return foldFork.apply(() -> left.fold(foldLeaf, foldFork), () -> right.fold(foldLeaf, foldFork));
      }
    };
  }

  default <R> R visit(
      Function<T, R> onLeaf,
      BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, R> onFork) {

    interface Deconstrutor<T> {
      <W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork);
    }

    return
        this.<Deconstrutor<T>>fold(
                value ->
                    new Deconstrutor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork1) {
                        return onLeaf1.apply(value);
                      }
                    },

                (left, right) ->
                    new Deconstrutor<>() {
                      @Override
                      public <W> W visit(
                          Function<T, W> onLeaf1,
                          BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork1) {
                        return onFork1.apply(
                            left.get().visit(BinaryTreeBoehmBerarducciLazy::ofLeaf, BinaryTreeBoehmBerarducciLazy::ofFork),
                            right.get().visit(BinaryTreeBoehmBerarducciLazy::ofLeaf, BinaryTreeBoehmBerarducciLazy::ofFork));
                      }
                    })

            .visit(onLeaf, onFork);
  }
}
Matthias #

Scott encoding in Java

Assume you have your typical persistent binary tree data type with labeled leaves:

Bᵨ := ρ + (Bᵨ × Bᵨ)

Or, in Haskell notation:

data BinaryTree t = Leaf t | Fork (BinaryTree t) (BinaryTree t)

We can represent it in modern Java in a straight-forward way:

sealed interface BinaryTree<T> {
  record Leaf<T>(T value) implements BinaryTree<T> {}
  record Fork<T>(BinaryTree<T> left, BinaryTree<T> right) implements BinaryTree<T> {}
}

How do you pattern-match on it?

You could wait for pattern matching for switch to exit preview. Then you will be able to write a function that sums up all the leaves in a tree of integers in a straight-forward way:

class BinaryTreeOps {

  static Integer sum(BinaryTree<Integer> b) {
    return switch(b) {
      BinaryTree.Leaf leaf -> leaf.value;
      BinaryTree.Fork fork -> sum(fork.left) + sum(fork.right);
    };
  }
}

If you do not want to wait, here is an alternative. Add a visit method that takes a deconstructor function for each case and implement it by calling the corresponding deconstructor in each case class:

sealed interface BinaryTree<T> {

    <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

    record Leaf<T>(T value) implements BinaryTree<T> {

      @Override public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onLeaf.apply(value);
      }        
    }

    record Fork<T>(BinaryTree<T> left, BinaryTree<T> right) implements BinaryTree<T> {

      @Override public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onFork.apply(left, right);
      }        
    }
}

Now you can sum up a tree of integers by folding it using the visit method:

class BinaryTreeOps {

  static Integer sum(BinaryTree<Integer> b) {
    return b.visit(
        n -> n, 
        (l, r) -> sum(l) + sum(r));
  }
}

Once you realize that pattern matching is a universal operation and therefore all you ever need out of a data structure (if we ignore performance concerns, that is), it becomes evident that you can get rid of the record middlemen and take visit as the data structure itself:

@FunctionalInterface
interface BinaryTree<T> {

  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

  static <T> BinaryTree<T> ofLeaf(T value) {
    return (onLeaf, onFork) -> onLeaf.apply(value);   // does not compile; see below
  }

  static <T> BinaryTree<T> ofFork(BinaryTree<T> left, BinaryTree<T> right) {
    return (onLeaf, onFork) -> onFork.apply(left, right);   // does not compile; see below
  }
}

No records involved. In fact, notice how there are no data structures in a classical sense whatsoever. Instead, this implementation uses pure closures as data containers.

This would be a valid representation of a binary tree; alas, Java does not permit lambda expressions to implement interface methods with type parameters, so you have to forgo the syntactic sugar and write it out in the old-fashioned way:

interface BinaryTree<T> {

  <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork);

  static <T> BinaryTree<T> ofLeaf(T value) {
    return new BinaryTree<>() {
      @Override
      public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onLeaf.apply(value);
      }
    };
  }

  static <T> BinaryTree<T> ofFork(BinaryTree<T> left, BinaryTree<T> right) {
    return new BinaryTree<>() {
      @Override
      public <R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTree<T>, BinaryTree<T>, R> onFork) {
        return onFork.apply(left, right);
      }
    };
  }
}

More verbose, but the same thing, and this time it compiles and works just fine.

This encoding of an algebraic data structure is called the Scott encoding of the data structure.

Clearly it is not very Enterprise-ready, but it does demonstrate how you can summon data structures out of nothing in a language that supports lexical closures.

Matthias #

Simulating bad drive blocks with Device Mapper

Say you have a 0.5 MiB (= 1,024 sectors of 512 bytes each) drive at /dev/loop1 and would like to boot it with QEMU while simulating a broken sector at position 256.

You can use dm-error for this.

Write the following into a file and call it broken-drive.dm:

0 256 linear /dev/loop0 0
256 1 error
257 767 linear /dev/loop0 257

Alternatively, you can make use of dm-flakey to simulate a sector that is only sometimes broken, or that does something even worse such as drop any writes made to it. For example:

0 256 linear /dev/loop0 0
256 1 flakey /dev/loop0 256 5 5
257 767 linear /dev/loop0 257

Refer to the documentation of dm-flakey on how exactly it works and what the parameters are that it expects.

Create a virtual device at /dev/mapper/broken-drive using dmsetup create:

dmsetup create broken-drive <broken-drive.dm

You can now use it with QEMU just like any other drive or drive image.

Matthias # (3)

Comments are now supported

This web site now supports comments.

Comments are simple and anonymous by default. They support a restricted number of HTML tags and their Markdown equivalents:

  • blocks: p, blockquote, pre
  • inline style: em, strong, code, sub, sup, s, ins, del
  • lists: ul, ol, li, dl, dt, dd
  • accessibility hints: abbr, acronym

To comment, click the permalink hash mark (#) at the top of the post in question. Be sure to enable JavaScript in your web browser.

Matthias #

YAML style recommendations

While I’m not a particularly big fan of YAML overall, it does have some clear benefits over both JSON and XML for human-editable configuration files. And while there are some pretty compelling alternatives, YAML is currently ubiquitous, so it makes sense to make the best of it.

Here are my favorite simple style rules. I apply them aggressively whenever I see some YAML that isn’t bolted to the wall and fixed with superglue.

Rule 1. Indent enumerations.

Rule: Indent enumerated items relative to the key that defines them.

Reason: Adhering to the Rectangle Rule makes it easier to find where blocks begin and end.

Bad:

list1:
- item1
- item2
list2:
- item3
- item4

Good:

list1:
  - item1
  - item2
list2:
  - item3
  - item4

This is by far my favorite rule. Even if you do nothing else, it improves the readability of any YAML file by an order of magnitude.

Rule 2. Separate nested blocks by white space.

Rule: Use empty lines to separate blocks of nested content.

Hint: A good rule of thumb is that any block that has children should be surrounded by white space on both sides.

Reason: Empty lines make it easier to find where blocks begin and end.

Bad:

metadata:
  name: mulkcms2
  namespace: mulk
  labels:
    name: mulkcms2-web
    app: mulkcms2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mulkcms2
      group: mulk
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%

Good:

metadata:
  name: mulkcms2
  namespace: mulk

  labels:
    name: mulkcms2-web
    app: mulkcms2

spec:
  replicas: 1

  selector:
    matchLabels:
      app: mulkcms2
      group: mulk

  strategy:
    type: RollingUpdate

    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%

This is a rule that I apply not just to YAML, but to any type of code whatsoever. It not just helps with readability, but also with editability (by enabling me to navigate by block).

Matthias #

Well-maintained (or not) OpenJDK Docker images

Here is a list of major OpenJDK vendors and the container images they offer.

Vendor Image name Tag Release cycle Base OS Remarks
Azul docker.io/azul/zulu-openjdk 17 LTS Ubuntu
Azul docker.io/azul/zulu-openjdk-alpine 17 LTS Alpine Linux
Azul docker.io/azul/zulu-openjdk-centos 17 LTS CentOS
Azul docker.io/azul/zulu-openjdk-debian 17 LTS Debian
BellSoft docker.io/bellsoft/liberica-openjdk-alpine 17 LTS Alpine (glibc)
BellSoft docker.io/bellsoft/liberica-openjdk-alpine latest non-LTS Alpine (glibc)
BellSoft docker.io/bellsoft/liberica-openjdk-alpine-musl 17 LTS Alpine (musl)
BellSoft docker.io/bellsoft/liberica-openjdk-alpine-musl latest non-LTS Alpine (musl)
BellSoft docker.io/bellsoft/liberica-openjdk-centos 17 LTS CentOS
BellSoft docker.io/bellsoft/liberica-openjdk-centos latest non-LTS CentOS
BellSoft docker.io/bellsoft/liberica-openjdk-debian 17 LTS Debian
BellSoft docker.io/bellsoft/liberica-openjdk-debian latest non-LTS Debian
Eclipse docker.io/library/eclipse-temurin latest non-LTS Ubuntu recommended non-LTS1
Eclipse docker.io/library/eclipse-temurin 17 LTS Ubuntu recommended LTS1
Eclipse docker.io/library/eclipse-temurin 17-alpine LTS Alpine
Google gcr.io/distroless/java17-debian11 latest LTS Debian
Microsoft mcr.microsoft.com/openjdk/jdk 17-ubuntu LTS Ubuntu
Microsoft mcr.microsoft.com/openjdk/jdk 17-mariner LTS CentOS (derivative)
Microsoft mcr.microsoft.com/openjdk/jdk 17-cbld LTS Debian (derivative)
Oracle container-registry.oracle.com/java/openjdk latest non-LTS Oracle Linux recommended non-LTS2
Red Hat registry.access.redhat.com/ubi8/openjdk-17 latest LTS RHEL (UBI)4 recommended LTS3
Red Hat registry.access.redhat.com/ubi8/openjdk-17-runtime latest LTS RHEL (UBI)4

General remarks:

As is apparent from the list, most vendors do not offer a rolling non-LTS image. Be careful when using a non-LTS image pinned to a specific version as its time under support will be quite limited. Rolling non-LTS images that always update to the latest OpenJDK version are fine (and may in fact be more secure and reliable than any LTS image considering that OpenJDK Updates primarily consists of backports from later versions).

Generally speaking, Docker images, particularly OpenJDK images, tend to drift from the latest update state of the base OS underlying them. It is probably a good idea to build your own runtime image (perhaps based on something like UBI Micro (manual)) and keep it up to date through a nightly CI job.

I cannot recommend any Alpine-based images at present because there are too many dependencies on glibc specifics (see also) in the ecosystem and using glibc on Alpine is not a supported configuration.

Footnotes:

  1. Being a widely deployed image with lots of attention given to it, the Temurin image is probably the one you want if you prefer Ubuntu over RHEL.

  2. Oracle is the main sponsor of OpenJDK. New OpenJDK releases tend to find their way into their image promptly. Oracle Linux is also a generally well-maintained and secure base; do note, however, that the OpenJDK image is typically only updated when a new OpenJDK is released, so you have to install system package updates yourself.

  3. Red Hat is the second largest contributor to OpenJDK (after Oracle) and one of the sponsors of the OpenJDK 17 Updates project and is typically quick to release security patches. UBI8 is also a well-maintained and secure image base.

  4. UBI is a trimmed-down version of RHEL that Red Hat distribute free of charge as part of their container image offerings.

Matthias #

How do I create smaller initramfs images on Ubuntu?

If you are running Ubuntu, your initramfs images may be quite large. On my system, for instance, each initramfs took up about 100 MiB of space. Because I did not pay enough attention when setting up the computer I ended up with a very small boot partition, which prompted me to look for a way to make the initramfs images generated by update-initramfs smaller.

Caution: Any of the below may make your system unbootable. Please only copy the steps if you understand what they do.

Step 1. Fewer kernel modules.

Create a file called, say, /etc/initramfs-tools/conf.d/zzz-custom (the exact name of the file does not matter much) and fill it with the following:

MODULES=dep

This causes mkinitramfs to guess the set of kernel modules required to boot your system based on what is currently loaded and what hardware is present instead of indiscriminately including whatever could be useful to make a computer boot.

This saved me about 50 MiB, reducing the size of the initramfs from 100 MiB to 50 MiB.

Step 2. No GPU.

Assuming you do not need to interact with the initramfs (to debug boot problems or to type in a disk encryption password, say), you can disable the scripts that deal with setting up a graphics frame buffer. Doing so gets rid of GPU firmware, which at least for the amdgpu driver is a pretty sizable amount of data.

Adding the following to /etc/initramfs-tools/conf.d/zzz-custom may or may not be good enough:

FRAMEBUFFER=n

In my case it was not good enough. Since I boot from ZFS, the /usr/share/initramfs-tools/conf-hooks.d/zfs hook was active, forcing FRAMEBUFFER to y regardless of what /etc/initramfs-tools/conf.d says.

But of course you can add your own configuration hook to /usr/share/initramfs-tools/conf-hooks.d. You just have to ensure it runs after the zfs one by giving it a lexicographically higher name. So create a file called /usr/share/initramfs-tools/conf-hooks.d/zzz-custom and fill it with the same content as above:

FRAMEBUFFER=n

As long as you do not need a disk encryption passphrase prompt in the initramfs, this should not break anything. If you do, it is probably a bad idea.

This saved me another 30 MiB, reducing the size of the initramfs from 50 MiB to 20 MiB.

Matthias #

How do I access binary data using Vert.x-redis or Quarkus Redis Client?

Vert.x-redis provides the RedisAPI class to access most Redis functionality. The methods provided by RedisAPI only take Strings as arguments, which it encodes in UTF-8. With Quarkus you get to choose between RedisClient and ReactiveRedisClient, both of which mirror RedisAPI, again only containing String-based methods.

So how do you store binary strings?

With Vert.x-redis, use the send method on the Redis class, which is the lower-level interface underlying RedisAPI. Similarly, with Quarkus, instead of injecting a RedisClient or ReactiveRedisClient, inject the lower-level MutinyRedis:

@Inject
MutinyRedis mutinyRedis;

Uni<Void> set(String key, byte[] data) {
    return mutinyRedis
        .send(
            Request.cmd(Command.SET)
                .arg(key)
                .arg(data))
        .replaceWithVoid();
}

Uni<byte[]> get(String key) {
    return mutinyRedis
        .send(
            Request.cmd(Command.GET)
                .arg(key))
        .ifNotNull()
        .transform(Response::toBytes);
}
Matthias #

I tried out GNOME 40 today.

I had already read about it online, with quite a few articles suggesting that it was going into the wrong direction UI-wise. One common complaint was that it seemed more touch-oriented and less efficient to use with a mouse.

After trying it out, I am pleasantly surprised. I find I am quite happy with the gesture-based input paradigm that GNOME 40 is built around. In particular, navigating between the activity overview and the main desktop has become significantly quicker for me.

Perhaps my fondness of the MacBook’s way of treating its trackpad as a first-class input device has rubbed off on me and I am now attuned to it. What is surprising, however, is that notwithstanding the substantial amount of time I have spent in macOS, I seem to prefer GNOME’s implemention over the Mac’s. It feels, to me, even more efficient and to the point.

Matthias #

Even with popular support for the CSU dwindling, it still seems unlikely for the left to capture a significant number of direct mandates (first-vote seats assigned by first-past-the-post voting) in Bavaria due to vote splitting between the three left-leaning parties (the Left Party, the Social Democrats, and the Greens).

If I were responsible for any of the three parties’ Bavarian election strategies, I would propose to the other two to hold a joint primary election in each voting district. This would likely improve the odds for both the Social Democrats and the Greens while not really changing them for the Left Party (while still giving them the benefit of hurting the other end of the political spectrum plus the opportunity to build alliances within their own).

Matthias #

The big issue that I have with how the German government does COVID politics is that they can’t seem to handle the trolley problem.

The default choice appears to be inaction. A decision to act is made only when there is (seemingly) irrefutable scientific consensus that it causes no harm. That is arguably the correct approach for a doctor to take on a patient who cannot make an informed decision for themselves. But a pandemic is more akin to war than to a doctor’s visit. There will be casualties, some of which innocent bystanders—the only question is how many. In that kind of situation, inaction is not the safe choice. There is no safe choice.

Matthias #

I just finished migrating my Mailcow installation from a native Kubernetes port that I had made by hand and that was becoming impossible to update to a more streamlined (albeit wasteful with resources) deployment where I wrap docker-compose and a dedicated Docker instance in a Kata container that I run as a Kubernetes pod.

The container is built as a Nix expression and is available in my public Kubeia repository, which contains part of my Kubernetes deployment configuration and image build scripts. A README is available too, in case you would like to try running it yourself.

Do note that the Kubernetes deployment file is just provided as an example. It contains some pretty specific references to my particular deployment – view it as something like a template that you will have to copy and fill with your own data. Another idiosyncracy is that I really really dislike running multiple database servers on a single piece of hardware and so I kluged something in that makes Mailcow use my already provisioned MariaDB instance rather than its own. In other words, your mileage may very much vary.

Matthias #

On unsafe Rust

A common misunderstanding is that unsafe Rust is more liberal than safe Rust. It is not. The same invariants apply, but instead of the compiler, it’s your job to uphold them.

Unsafe Rust is what you write when in any other language you would have written a C extension. It is a very rare thing to do.

Matthias #

Ich wurde kürzlich darauf hingewiesen, daß Doppelnennungen wie in „Studentinnen und Studenten“ nicht als inklusiv gelten, da sie nur Frauen und Männer einschließen, nicht aber Menschen, die sich als weder noch begreifen. Nur mit Gendersternchen sei es inklusiv: „Student*innen“.

Ich schlage eine Alternative vor. Wie wäre es, wenn wir wieder dazu übergingen, mehr angelsächsische Kultur zu übernehmen, und einfach „Studenten“ sagten? Revolutionär, ich weiß.

Matthias #

Remember: The closer we get to a vaccine, the easier it is to justify a stricter lockdown.

If the logic behind that statement doesn’t seem obvious to you, think about the extreme cases: If we were in a pandemic with no chance of ever getting rid of it or finding any sort of treatment, a lockdown would make little sense. Since everyone would contract the virus eventually, very few people would be saved by the measures, but more people would suffer or even die due to the economic consequences of a lockdown whose duration would have to be indefinite. If, on the other hand, we were just two weeks away from eradicating the pandemic at the snip of a finger, then it would clearly be the correct thing to do to impose a strict lockdown for those two weeks in order to maximize the number of lives saved, since each person who manages to avoid the virus for just another two weeks would be saved from it for good.

Matthias #

Guess what one of the top disk latency inducers is on my (functionally mostly idle) server.

# zfsslower
Tracing ZFS operations slower than 10 ms
TIME     COMM           PID     T BYTES   OFF_KB   LAT(ms) FILENAME
09:14:10 async_49       2675004 S 0       0          18.29 journal.jif

The mysteriously named async_49 represents Mnesia as used by RabbitMQ as part of… Zulip.

Have I mentioned that a Zulip instance hosting 3 users is a waste of resources? Oh, I have, haven’t I?

Matthias #

How do I fix CGit’s display of a repository’s time of last update?

If you copied Git repositories into CGit at one point, you may have done so without keeping their mtimes intact. In this case, CGit will display an incorrect time of last update for the affected repositories, as it does not determine it based on the most recent Git commit but rather the time the default branch was last touched on the local file system.

By default, CGit uses the mtime of refs/heads/master (assuming that master is your default branch) to determine the time of last update, so this is how you can fix the time to be the same as the commit date of the last commit:

touch -c refs/heads/master -t $(date +"%Y%m%d%H%M.%S" --date=@$(git show -s --format=%ct HEAD))
next page ⇢