## Boehm–Berarducci encoding in Java

Last time we looked at how to apply Scott encoding to algebraic data types in Java. This time we are going to look at Boehm–Berarducci encoding, which is related.

Again we take the well-known persistent binary tree data type with labeled leaves:

```
Bᵨ := ρ + (Bᵨ × Bᵨ)
```

Or, in Haskell notation:

```
data BinaryTree t = Leaf t | Fork (BinaryTree t) (BinaryTree t)
```

Recall the **Scott encoding** of this data structure:

```
interface BinaryTreeScott<T> {
<R> R visit(Function<T, R> onLeaf, BiFunction<BinaryTreeScott<T>, BinaryTreeScott<T>, R> onFork);
}
```

From a type-theoretic perspective, there is a problem with this: `BinaryTreeScott<T>`

is defined in terms of itself. Perhaps you want to work in a type system that does not support recursive types.

### Encoding

The **Boehm–Berarducci encoding** comes to the rescue. For our binary tree example it is:

```
interface BinaryTreeBoehmBerarducci<T> {
<R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);
}
```

Note how the only technical difference is in how the self-type is encoded. In other words, the two encodings coincide on non-recursive types.

The intuitive difference is that while the Scott encoding encodes a data type using its pattern match, Boehm–Berarducci encodes it using its fold.

### Implementation

The implementation is straight-forward:

```
interface BinaryTreeBoehmBerarducci<T> {
<R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);
static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) {
//return (foldLeaf, foldFork) -> foldLeaf.apply(value);
return new BinaryTreeBoehmBerarducci<>() {
@Override
public <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork) {
return foldLeaf.apply(value);
}
};
}
static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) {
//return (foldLeaf, foldFork) -> foldFork.apply(left.fold(foldLeaf, foldFork), right.fold(foldLeaf, foldFork));
return new BinaryTreeBoehmBerarducci<>() {
@Override
public <R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork) {
return foldFork.apply(left.fold(foldLeaf, foldFork), right.fold(foldLeaf, foldFork));
}
};
}
}
```

Summing up the leaves is now even easier than it was before, without even requiring explicit recursion:

```
class BinaryTreeBoehmBerarducciOps {
static Integer sum(BinaryTreeBoehmBerarducci<Integer> b) {
return b.fold(
n -> n,
(l, r) -> l + r);
}
}
```

### Pattern matching

But how do you perform a mere pattern match without recurring? This has now become quite tricky to do.

Your first idea might be to convert the Boehm–Berarducci-encoded data type into its Scott encoding:

```
interface BinaryTreeBoehmBerarducci<T> {
<R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);
static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) { ... }
static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) { ... }
default BinaryTreeScott<T> toScott() {
return fold(
BinaryTreeScott::ofLeaf,
BinaryTreeScott::ofFork);
}
}
```

This works, but now we depend on the recursively defined type of `BinaryTreeScott<T>`

again, which is what we wanted to avoid. To get around this limitation, we define a non-recursive helper type that fulfills the same role as `BinaryTreeScott<T>`

, but based on `BinaryTreeBoehmBerarducci<T>`

:

```
interface Deconstrutor<T> {
<W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork);
}
```

Then we `fold`

our `BinaryTreeBoehmBerarducci<T>`

into a `Deconstructor<T>`

, on which we can call `visit`

to perform our pattern match:

```
interface BinaryTreeBoehmBerarducci<T> {
<R> R fold(Function<T, R> foldLeaf, BiFunction<R, R, R> foldFork);
static <T> BinaryTreeBoehmBerarducci<T> ofLeaf(T value) { ... }
static <T> BinaryTreeBoehmBerarducci<T> ofFork(BinaryTreeBoehmBerarducci<T> left, BinaryTreeBoehmBerarducci<T> right) { ... }
default <R> R visit(
Function<T, R> onLeaf,
BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, R> onFork) {
interface Deconstructor<T> {
<W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork);
}
return
this.<Deconstructor<T>>fold(
v ->
new Deconstructor<>() {
@Override
public <W> W visit(
Function<T, W> onLeaf1,
BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork1) {
return onLeaf1.apply(v);
}
},
(left, right) ->
new Deconstructor<>() {
@Override
public <W> W visit(
Function<T, W> onLeaf1,
BiFunction<BinaryTreeBoehmBerarducci<T>, BinaryTreeBoehmBerarducci<T>, W> onFork1) {
return onFork1.apply(
left.visit(BinaryTreeBoehmBerarducci::ofLeaf, BinaryTreeBoehmBerarducci::ofFork),
right.visit(BinaryTreeBoehmBerarducci::ofLeaf, BinaryTreeBoehmBerarducci::ofFork));
}
})
.visit(onLeaf, onFork);
}
}
```

`BinaryTreeBoehmBerarducci#visit`

now works as `BinaryTreeScott#visit`

did before.

The only problem is that it is horrendously inefficient as it traverses the whole tree and constructs a complete mirror tree of `Deconstructor<T>`

objects for just a single pattern match.

### Optimization: lazy fold

We can remedy the pathological inefficiency by making the fold operation lazy in the recursive argument:

```
interface BinaryTreeBoehmBerarducciLazy<T> {
<R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork);
}
```

The complete code is thus:

```
@FunctionalInterface
public interface BinaryTreeBoehmBerarducciLazy<T> {
<R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork);
static <T> BinaryTreeBoehmBerarducciLazy<T> ofLeaf(T value) {
return new BinaryTreeBoehmBerarducciLazy<>() {
@Override
public <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork) {
return foldLeaf.apply(value);
}
};
}
static <T> BinaryTreeBoehmBerarducciLazy<T> ofFork(BinaryTreeBoehmBerarducciLazy<T> left, BinaryTreeBoehmBerarducciLazy<T> right) {
return new BinaryTreeBoehmBerarducciLazy<>() {
@Override
public <R> R fold(Function<T, R> foldLeaf, BiFunction<Supplier<R>, Supplier<R>, R> foldFork) {
return foldFork.apply(() -> left.fold(foldLeaf, foldFork), () -> right.fold(foldLeaf, foldFork));
}
};
}
default <R> R visit(
Function<T, R> onLeaf,
BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, R> onFork) {
interface Deconstrutor<T> {
<W> W visit(Function<T, W> onLeaf, BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork);
}
return
this.<Deconstrutor<T>>fold(
value ->
new Deconstrutor<>() {
@Override
public <W> W visit(
Function<T, W> onLeaf1,
BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork1) {
return onLeaf1.apply(value);
}
},
(left, right) ->
new Deconstrutor<>() {
@Override
public <W> W visit(
Function<T, W> onLeaf1,
BiFunction<BinaryTreeBoehmBerarducciLazy<T>, BinaryTreeBoehmBerarducciLazy<T>, W> onFork1) {
return onFork1.apply(
left.get().visit(BinaryTreeBoehmBerarducciLazy::ofLeaf, BinaryTreeBoehmBerarducciLazy::ofFork),
right.get().visit(BinaryTreeBoehmBerarducciLazy::ofLeaf, BinaryTreeBoehmBerarducciLazy::ofFork));
}
})
.visit(onLeaf, onFork);
}
}
```