What is Dataflow Programming?

You may have heard of lazy evaluation. It works like this: You specify the meaning of an expression not by putting a value into a box, but by declaratively writing down a way of computing the expression. Whenever someone asks for the value of an expression, it is computed on-demand and cached. Thus, we have a dependency graph between expressions (that is hopefully acyclic). Since we compute only what someone actually asks for, control flows from the consumer to the producer, up the dependency chain. The data is passive, waiting for the consumer to ask for things to be computed; the consumer actively initiates all computation. This is the pull-model of declarative computation.

Rather less widespread, there is a technique that is dual to lazy evaluation. As in lazy evaluation, you specify the meaning of something by writing down how to compute it from other things. But now, you do not wait for someone to ask for the value. Instead, you compute the value right away; and whenever one of the inputs changes, you immediately recompute the value. Thus, control flows from the producer to the consumer, down the dependency chain. The data is active, reacting to events from the outside world as soon as they happen; the consumer sits waiting passively and gets notified whenever something changes. This is the push-model of declarative computation. It is sometimes called the spreadsheet model, since it operates like a spreadsheet: You have a bunch of cells, and whenever you modify a cell, the change immediately propagates to all dependent cells, prompting visual adjustments as well.

But reactive patterns naturally arise not just in spreadsheets, but also in multimedia programming. Take a game object, a person, say, consisting of various parts, such as legs, arms, and a head. You may want to draw each part independently from the others, but you always know the relative locations of the parts with regard to the object. Whenever you move the object, you want the parts to move accordingly. Hence, while specifying something like arm.x := person.x - 10 is very natural, you expect arm.x to update itself whenever person.x changes. In fact, the multimedia-oriented scripting language JavaFX provides just such a kind of limited dataflow mechanism.

There have been various implementations of the general idea. Most academic research has been done under the moniker of Functional Reactive Programming. Notable implementations are Racket's FrTime and the Elm programming language, among others. But there have also been implementations focussing on the dataflow paradigm itself. The Oz programming language is one of the best-known. Another interesting and very mature implementation is Kenny Tilton's Cells library, which extends the Common Lisp Object System and has sparked a number of ports to other languages.

Implementation in C++

For my own immediate use, however, I needed something C++-based. Now, I have not put the time into this endeavor necessary to optimize and debug it; but I have got an apparently working implementation in C++11 that seems to be passable for my rather low requirements at this time. Get it with Git:

$ git clone https://matthias.benkard.de/code/cells++

If you prefer, you can also pull it from GitHub. It is a header-only library. The distribution contains a small example program that I reproduce below.

Update 2012-12-13: I have updated the example program to illustrate automatic detection and deregistering of destroyed cells.

Hint: You can use smart pointers or a garbage collector to ensure that cell dependencies are never destroyed as long as something depends on them. You need not, however, take into account the reverse relationships (i.e., the push-graph). The library deals with all the necessary deregistering.

// Copyright 2012, Matthias Andreas Benkard.

#include "cells.hpp"

#include <cstdlib>
#include <iostream>

struct unit { };

using namespace cells;
using std::cout;
using std::endl;

typedef formula_cell<double> fcell;
typedef formula_cell<unit>   ucell;

int main(int argc, char** argv) {
  fcell x0, x1, x2, y;
  ucell b;

    fzell z;
    ucell a;

      x1.reset([&](){ return *x0 + 5; });
      x2.reset([&](){ return *x0 * 2; });
      y.reset([&](){ return *x1 * *x2; });
      z.reset([&](){ return *x0 * *y; });

      // a is an observer on z.  b is an observer on x2.
      a.reset([&]() -> unit { cout << "z is now " << *z << "." << endl; return unit(); });
      b.reset([&]() -> unit { cout << "x2 is now " << *x2 << "." << endl; return unit(); });
    // Output:
    // z is now 3000.
    // x2 is now 20.

    // Output:
    // x2 is now 30.
    // z is now 9000.

    // Output:
    // z is now -12000.
    // x2 is now -40.

    // Output:
    // z is now 60.

    x1.reset([&]() { return *x0; });
    // No output, y (and hence z) does not depend on x1 right now.

    y.reset([&]() { return *x1 + *x2; });
    // Output:
    // z is now 1200.

  // z and a aren't there anymore, so only b will produce output now:
  // x2 is now 20.

  return EXIT_SUCCESS;

 * z
 * |\
 * | \
 * |  \
 * |   \
 * |    \
 * |     \
 * |      \
 * |       y
 * |      /|
 * |     / |
 * |    /  |
 * |   x1  x2
 * |  /    |
 * | /     |
 * x0------+