Programming Toolbox: Eventual values

(This is part of a series of posts on tools for thinking about programming — abstractions, patterns and smells, among other things — which I think make all kinds of programming tasks easier. See here for other posts like this one.)

The eventual value is an interesting abstraction to deal with both synchronous and asynchronous computations. The point of this tool is to abstract the time of evaluation of a value.

Most of the concepts in this post are covered in the Wikipedia article on Futures and Promises, which is definitely worth reading in full. You’ll find a lot of interesting references there.

According to Wikipedia, the name eventual was first used by Peter Hibbard. Although I don’t know if the ideas here match those in his article, I chose to use it because other common names for this pattern are used in several different languages with slightly different meanings, bound to specific implementations and strategies. Eventual is less known, therefore less overloaded.

The eventual value is most obviously useful in async scenarios, but allowing the programmer to forget about the timing of things is a great way of simplifying code in general.

First, let’s look at how this abstraction shows up in several forms in Clojure and JavaScript. Then, we’ll talk about how we could implement a simple version of this to refactor some Java code.

While we go through these hoops, I will point out important differences between these programming artifacts. Understanding the differences is key to grokking the unifying theme.

Clojure

Clojure has several interesting options to deal with concurrency (in fact, that’s one of it’s selling points). Three of them are pretty closely related: delay, future and promise. All three eventually hold a single immutable value, accessible by calling the deref function or using the @ symbol. They differ on how that value is evaluated.

Delays receive a code block to run, but do nothing until deref is called. At that moment, the code block will be evaluated, and the result will be the value of the delay. If a thread calls deref while another thread is evaluating the delay, the second thread will block until the other is done, then receive the evaluated value. Notice that this means the code block will only run once.

(def d (delay
        (Thread/sleep 1000)
        (* 6 7)))
(deref d) ; sleeps, then computes 6 * 7, then returns 42
(deref d) ; returns 42 immediately

Futures spawn a new thread, which will evaluate a code block. The result of the block will be the value of the future. If a thread calls deref on a future before its code block has finished running, it will block until the value is available.

(def f (future
        (Thread/sleep 1000)
        (* 6 7)))
(deref p) ; blocks until the future thread is finished, then returns 42
(deref p) ; return 42 immediately

Promises are like a write-once variable. Calling the deliver function on a promise sets its value, and any subsequent call to deliver has no effect. If a thread calls deref on a promise before it is delivered, it will block until the value is available.

(def p (promise))
(.start (Thread. #(do
                   (Thread/sleep 1000)
                   (deliver p (* 6 7)))))
(deref p) ; blocks until the other thread delivers, then returns 42
(deref p) ; return 42 immediately

The interesting thing here is that, even though these are three pretty different evaluation strategies, the language gives us a single interface for all of them: deref. This means you can make an API that returns something derefable, while retaining the freedom to change whether the value is pre-computed at startup, or by a background thread started at a convenient moment, or only when it is first needed.

One important reminder here is that the deref function in Clojure also works on other entities besides the three mentioned above, for example atoms and agents. These entities, however, can have their values changed after they are set, so they don’t quite match the abstraction we are looking for (they look more like references than values). Unfortunately, Clojure lacks an specific interface for eventual values.

Javascript

Javascript code runs in a single thread. This forced the language to turn to an unusual approach for tasks that take too long (like IO): dispatch a task to some other process or thread on the runtime environment (which is not controlled by the JS virtual machine), and run a callback when that task completes. This approach leads to code like this:

navigator.geolocation.getCurrentPosition(location => {
  alert(`lat: ${location.coords.latitude}, lon: ${location.coords.longitude}`)
})

The call to getCurrentPosition returns undefined and doesn’t block the JS thread. The browser will asynchronously find out the location and, when that’s done, notify the Javascript VM to call the callback function with the location as an argument.

Another way to look at what’s going on here is to think that the getCurrentPosition call doesn’t return anything immediately, but eventually returns the location via its callback. So another way this API could have been written is to return (immediately) an object that will eventually hold the position. In modern JS, these objects are called Promises.

Warning: do not confuse it with the Clojure promise. They are closely related, but the JS concept is fit to a different execution model (single-threaded), so it works a bit differently.

From the MDN docs:

The Promise object represents the eventual completion (or failure) of an asynchronous operation, and its resulting value.

The JS Promise exposes a .then method, which expects a callback as its parameter. Whenever the Promise resolves (i.e. its value is delivered), any callbacks registered by calling .then will be called. After the Promise is resolved, any new call to .then will end up with the callback being called with that same value.

One interesting feature of JS Promises is that each .then call returns a new Promise. This derived Promise will eventually resolve to the result of the callback passed to the original .then call. This allows for chaining Promises, like in the snippet below:

doSomething()
  .then(result => doSomethingElse(result))
  .then(newResult => doThirdThing(newResult))
  .then(finalResult => {
    console.log(`Got the final result: ${finalResult}`);
  })

Which looks very much like what we would do in Clojure with deref:

(let [result       (do-something)
      new-result   (do-something-else (deref result))
      final-result (do-third-thing (deref new-result))]
  (println (str "Got the final result:" (deref final-result))))

The actual execution is different, because each step runs asynchronously in the Javascript example: the Promise implementation schedules each .then callback to be run at a later time by the event loop of the only thread available. In the Clojure example, the thread running the deref calls will be blocked until some thread delivers the promises, completes the futures or computes the delays (we need not know which one).

Promises in JS fit very nicely to the use cases which would require a future or promise in Clojure. The delay use case, on the other hand, is not supported by the standard JS implementation: one cannot use the Promise constructor to define a computation that runs if and only if the .then method is called (see the MDN docs for details on the Promise constructor). It’s not too difficult to build a simple implementation of that (check this one, for example), but there’s no current standard in the language.

Extra: ReactiveX Observables

ReactiveX’s Observables are an interesting abstraction to model streams of values. The linked page mentions a distinction between hot and cold observables, which almost match the distinction between Clojure’s future/promise and delay.

A “hot” Observable may begin emitting items as soon as it is created, and so any observer who later subscribes to that Observable may start observing the sequence somewhere in the middle. A “cold” Observable, on the other hand, waits until an observer subscribes to it before it begins to emit items, and so such an observer is guaranteed to see the whole sequence from the beginning.

The important difference is that once a hot Observable has emitted a value, it is lost to anyone that wasn’t subscribed at that moment. Once a Clojure future/promise or a JS Promise is resolved to a value, that value is accessible forever (or until GC, of course).

A definition of sorts

So after looking at all these near-misses, here’s the abstraction we are looking for:

It may have no value at first, but eventually resolves to some value
Once resolved, the value never changes
There’s some way to wait for or be notified of the value’s resolution
It makes no guarantees about time of computation and resolution. In fact, it may even resolve synchronously at the time of access, or never resolve.

This is what I call an eventual value.

Clojure deref comes close to this, because it is an interface that applies to both delays, futures and promises. On the other hand, it misses the target by also applying to mutable entities like atoms and agents.

The JS Promise also isn’t an exact match, lacking support to the laziness we find in Clojure’s delay.

ReactiveX cold Observables are a close match to Clojure’s delay (if we take a single value from them, instead of several values on a stream), but hot Observables do not match the behavior we want, because the value is lost to a late subscriber.

Again, the important part is not language support or the specific implementation. The important part is what we are abstracting away: time of execution. This means you can reach for this concept whenever you feel that some module shouldn’t have to decide, or even care about, the time when something is executed.

(Although of course language support would make everything easier.)

Aside: The power of not giving guarantees

Notice one of the characteristics described above is a negative: an eventual value makes no guarantees about time of computation and resolution.

When an abstraction gives a guarantee, it gives power to the code that uses it. The user code can rely on it. On the other hand, the guarantee reduces the space of possible implementations one could choose from.

For example, the JS Promise specification guarantees that then blocks will not be ran synchronously, even if the result value is already available when .then is called. This means you cannot choose an implementation that does that.

So, in theory, the less you assert about the behavior of your abstraction in your contract (interface and docs), the more you keep your ability to change the implementation later on. It’s a way of delaying decisions: only add something to your contract when you know the your user code needs that guarantee.

On the other hand, Hyrum’s Law:

With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody.

Even though you may claim that “I never promised I’d resolve this asynchronously!“, a user of your API may still observe that you always resolve asynchronously, rely on that fact, and be mad when you break their software by changing that implementation “detail”.

The way I think about this is that the no-guarantees approach works best for internal APIs, where you have some control over how people are using the abstraction (so you can call them out on code review if they rely on implementation details).

It also works better when the aspects that you are leaving “undecided” are not extremely important (in the context they are being used). For example, on a webservice the question of whether a computation is running on the same thread or in a different one is tipically not very important. On the other hand, that’s a crucial matter when dealing with UI applications, because if you hang the main thread for too long, the UI will become unresponsive (and that’s why the JS Promise spec guarantees asynchronicity).

This means that you can only take the easy way out of “not giving guarantees” on aspects that don’t matter too much for the code that uses your API. As I said, it’s a way of delaying the decision of committing to certain aspects of an implementation, but there are things you cannot delay.

Why writing all of this, then? Well, even while agreeing that Hyrum’s Law applies to many cases, I still think it is useful to start from a no-or-very-few guarantees approach when thinking about APIs and abstractions. I then add more items to the contract as I investigate the intended and possible use cases.

Exercise

Now, on to a small exercise. Take a look at this Java class:

class MoviesLibrary {
  private Map<UUID, Movie> map;

  private static Map<UUID, Movie> loadMapFromFile() { /* ... */ }

  public void load() {
    this.map = loadMapFromFile();
  }

  public boolean isLoaded() {
    return this.map != null;
  }

  public Movie getMovie(UUID id) {
    return this.map.get(id);
  }
}

There are a couple problems here:

the getMovie method will crash with a NullPointerException if called before the load method; and
if the load method is called several times, the MoviesLibrary content may change unexpectedly (because of changes to the underlying file). So really the client code has to call isLoaded() to decide whether it should call load().

Can you use the ideas above to get rid of these problems? Check my answer below:

Answer — Hide/Show

Both problems go away if we make the load method idempotent and make the getMovie method wait until load has run at least once. This is pretty easy to implement:

class MoviesLibrary {
  private Map<UUID, Movie> map;

  private static Map<UUID, Movie> loadMapFromFile() { /* ... */ }

  public synchronized void load() {
    if (this.map == null) { // synchronized null-check
      this.map = loadMapFromFile();
    }
  }

  public Movie getMovie(UUID id) {
    if (this.map == null) {
      this.load();
    }
    return this.map.get(id);
  }
}

The synchronized keyword on load guarantees that the null-check inside that method will only pass on the first call; all other calls will be blocked until loadMapFromFile() has finished and this.map is assigned.

Now getMovie can simply call load() every time, and if necessary it will block until the loading is done. The null-check inside getMovie avoids the performance costs of calling a synchronized method if we know it’s not needed, but that’s an optimization, not strictly necessary to get us the simpler API.

We can drop the isLoaded() method because client code can just call load() without worries.

Notice how this approach gives us the same behavior as Clojure’s delay. We can pre-load the data if we need to by calling load(), or we can be totally lazy and just wait until the first getMovie. In fact, the current implementation for delay looks much like the above example (except for the idiosyncratic indentation).

Summary

Eventual values abstract the time of evaluation of a value from the code that uses that value. Turn to this tool whenever you feel that client code shouldn’t worry about what your evaluation strategy is.

If you have language support, use it, even if the match isn’t perfect: in Clojure, for example, an API can just return a derefable entity, documenting that it’s guaranteed to be immutable, but never say whether it’s a future, promise or delay (or a custom derefable that you wrote).

If you don’t have language support, or what you have doesn’t match your needs, just think about how you can make your API more eventual-like, the way we did in the exercise above.