Continuations Without Memory Allocation

hands

One of the main advantages of future (or promise, depending on the language used) is the ability to compose them via asynchronous continuations. Consider an example:

// pseudocode
auto f = when_all([]{ return http_get(“cat.com/nicecat.png”); }
[]{ return http_get(“dog.com/doggo.png”); })
.then([](auto p0, auto p1)
{
send_email(“mail@grandma.com”, combine(p0, p1))
});

f.execute(some_scheduler);

In this example, HTTP requests can be executed in parallel (depending on the scheduler’s operation, such as thread pooling). Once both requests are completed, a lambda function is automatically called with their results.

This model is convenient because it allows us to define oriented acyclic graphs of asynchronous computations in a clean and clear syntax. This is why the std::future standard is evolving to support .then and when_all. These features are included in the N4538 “Extensions for concurrency” proposal and were excellently presented by Anthony Williams in his talk “Concurrency, Parallelism and Coroutines” at ACCU 2016.

Hidden overhead

Consider the std::experimental::future::then signature:

template
auto std::experimental::future::then(F&&& func)
-> std::experimental::future< std::result_of_t(std::experimental::future)>
>;

The return type is again std::experimental::future, which means that the continuation is typed via future. A possible implementation of then might look like this:

template
auto then(std::experimental::future& parent, std::launch policy, F&&& f)
-> std::experimental::future< std::result_of_t&)>
>
{
return std::async(std::launch::async, [
parent = std::move(parent),
f = std::forward(f)
]
{
parent.wait();
return std::forward(f)(parent);
});
}

The use of std::async and type erasure hint at possible allocations. Let’s evaluate their value.

I wrote five .cpp files, using boost::async and boost::future, where I created a chain of .then continuations starting at zero and ending at four. For example:

// zero continuations
int main()
{
return boost::async([]{ return 123; }).get();
}
// one continuation
int main()
{
return boost::async([] { return 123; })
.then([](auto x) { return x.get() + 1; })
.get();
}

Alternative design

Instead of erasing types, you can preserve them by avoiding allocations and retaining the ability to compose asynchronous computations. Consider the following variant:

int main()
{
auto f = initiate([]{ return 1; })
.then([](int x){ return x + 1; })
.then([](int x){ return x + 1; });

f.execute(/* some scheduler */);

}

With a fully synchronous scheduler:

struct synchronous_scheduler
{
template
decltype(auto) operator()(F&&& f) const
{
return f();
}
};

The compiler generates only 2 lines of assembly code! And with std::async scheduler:

struct asynchronous_scheduler
{
template
decltype(auto) operator()(F&&& f) const
{
auto fut = std::async(f);
return fut.get();
}
};

891 lines of code are generated – but that number doesn’t grow when you add extensions, since the compiler can inline calls!

Implementation

Function initiate:

template
auto initiate(F&&& f)
{
return node{std::forward(f)}
}

Template class node:

template
struct node : F
{
template
node(FFwd&&& f) : F{FWD(f)} {}
template <typename FThen>
auto then(FThen&&& f_then);

template <typename Scheduler>
decltype(auto) execute(Scheduler&& s) &
{
return s(*this);
}

};

Implementation .then:

template
auto then(FThen&&& f_then)
{
return node{[
parent = std::move(*this),
f_then = FWD(f_then)
]() mutable -> decltype(auto)
{
return f_then(static_cast(parent)());
}};
}

This approach eliminates allocations and simplifies compiler optimization. In the next article we will analyze when_all and thread pooling.

Thank you for your attention!