zero-overhead C++17 currying & partial application
As I mentioned in my previous article many features introduced in the latest C++ standards allow functional patterns to thrive in your codebase. Two ideas from that programming paradigm that I really like are currying and partial application.
In this article we’re going to:
Introduce and briefly explain the two aforementioned concepts.
Write a generic
constexpr
zero-overheadcurry
function in C++17.Analyze the generated assembly of the implementation to prove the lack of overhead.
Currying
Let’s begin by explaining currying.
In mathematics and computer science, currying is the technique of translating the evaluation of a function that takes multiple arguments (or a tuple of arguments) into evaluating a sequence of functions, each with a single argument.
The following add3
function is an example of a
non-curried function, as its arity is \(3\).
auto add3(int a, int b, int c)
{
return a + b + c;
}
(1, 2, 3); // Returns `6`. add3
We can curry it by returning nested lambda expressions.
auto curried_add3(int a)
{
return [a](int b)
{
return [a, b](int c)
{
return a + b + c;
};
};
}
(1)(2)(3); // Returns `6`. curried_add3
As you can see, the arity of every lambda is \(1\). This pattern is useful because it allows developers to intuitively bind arguments incrementally until the last one. If you’re finding yourself constantly using the same arguments except one in a series of function calls, currying avoids repetition and increases readability.
auto add2_one = curried_add3(1);
auto add1_three = add2_one(2);
(3); // Returns `6`.
add1_three(4); // Returns `7`.
add1_three(5); // Returns `8`. add1_three
Basically, add1_three(5)
is equivalent to
add2_one(2)(5)
, which is equivalent to
curried_add3(1)(2)(5)
.
A slightly more realistic example could involve std::find
:
std::vector<std::string> names{/* ... */}
auto find_in_names =
(std::begin(names))(std::end(names));
curried_find
auto jack = find_in_names("Jack");
auto rose = find_in_names("Rose");
In the above code snippet some repetition between
std::find
invocations is cleanly avoided thanks to
currying.
(This short article by Ivan Čukić has some additional interesting examples of currying in C++.)
Partial application
In computer science, partial application (or partial function application) refers to the process of fixing a number of arguments to a function, producing another function of smaller arity.
Despite them being two separate concepts, partial application is very similar to currying. Even though I couldn’t find a formal confirmation anywhere, I believe that thinking about partial application as a “generalized form of currying” can be helpful: instead of binding one argument and getting \((arity - 1)\) unary functions back, we can bind \(n\) arguments at once and get another partially-applicable function with \((arity - n)\) arity.
Imagine we had add a partial_add3
function which allowed
partial application to sum three numbers:
(1, 2, 3); // Returns `6`.
partial_add3(1)(2, 3); // Returns `6`.
partial_add3(1, 2)(3); // Returns `6`.
partial_add3(1)(2)(3); // Returns `6`. (Currying!) partial_add3
As you can see, we can decide how many arguments to bind
(including zero). We could easily implement this in C++17 using
recursion, generic
lambdas, if constexpr(...)
,
and variadic
templates. (We’ll also use a fold
expression to compute the sum.)
template <typename... Ts>
auto partial_add3(Ts... xs)
{
static_assert(sizeof...(xs) <= 3);
if constexpr (sizeof...(xs) == 3)
{
// Base case: evaluate and return the sum.
return (0 + ... + xs);
}
else
{
// Recursive case: bind `xs...` and return another
return [xs...](auto... ys)
{
return partial_add3(xs..., ys...);
};
}
}
Writing code that enables currying and partial
application for every function is cumbersome. Let’s write a generic
curry
function that, given a function object
f
, returns a curried/partially-applicable
version of f
!
C++17 curry
As mentioned in the beginning of the article, these are the goals for
our curry
function:
Given a generic function object
f
, invokingcurry(f)
returns a curried/partially-applicable version off
.If
f
isconstexpr
-friendly, the returned one will be as well.curry
should not introduce any overhead compared to hand-written currying/partial application.
Credit where it’s due
Please note that the design and implementation of curry
that I am going to cover is a heavily-modified version of this snippet
that was tweeted by Julian Becker - in fact, it was
that tweet that inspired me to write this article.
Thanks!
(Julian also wrote an excellent answer on the StackOverflow question “How can currying be done in C++?” - make sure to check it out.)
Example usage
Before we analyze the declaration and definition of
curry
, let’s take a look at some usage examples.
Nullary functions:
auto greet = []{ std::puts("hi!\n"); }; (); // Prints "hi!". greet(greet); // Prints "hi!". curry // Compile-time error: /* curry(greet)(); */
As you can see, in the case of a nullary function object
f
, invokingcurry(f)
calls the original object immediately.Unary functions:
auto plus_one = [](auto x){ return x + 1; }; (0); // Returns `1`. plus_one(plus_one)(0); // Returns `1`. curry // Returns a wrapper around `plus_one` that enables // currying/partial application. // `plus_one` is "perfectly-captured" in the wrapper. auto curried_plus_one = curry(plus_one); (1); // Returns `2`. curried_plus_one
What does perfectly-captured mean?
It means that if the captured object is an lvalue, it will be captured by reference. If the captured object is an rvalue, it will be captured by move*. I’ve written a comprehensive article on this topic: “capturing perfectly-forwarded objects in lambdas”.
Binary functions:
auto add2 = [](auto a, auto b){ return a + b; }; // All of the invocations below return `3`. (1, 2); add2(add2)(1, 2); // Partial application. curry(add2)(1)(2); // Currying. curry // Example of "binding" an argument: auto add_one = curry(add2)(1); (2); // Returns `3`. add_one(3); // Returns `4`. add_one
You should be starting to see the pattern now…
\(N\)-ary functions:
auto add3 = [](auto a, auto b, auto c) { return a + b + c; }; // All of the invocations below return `6`. (1, 2, 3); add3(add3)(1, 2, 3); curry(add3)(1, 2)(3); curry(add3)(1)(2, 3); curry(add3)(1)(2)(3); curry
The example above shows that currying and partial application can be freely combined. Let’s see another example of that with a
constexpr
lambda of arity \(5\).auto add5 = [](auto a, auto b, auto c, auto d, auto e) constexpr { return a + b + c + d + e; }; constexpr auto csum5 = curry(sum5); constexpr auto a = csum5(0, 1, 2, 3, 4, 5); constexpr auto b = csum5(0)(1, 2, 3, 4, 5); constexpr auto c = csum5(0, 1)(2, 3, 4, 5); constexpr auto d = csum5(0, 1, 2)(3, 4, 5); constexpr auto e = csum5(0, 1, 2, 3)(4, 5); constexpr auto f = csum5(0, 1, 2, 3, 4)(5);
Note that the usages of
curry(sum5)
above are in no way exhaustive - more combinations such ascurry(sum5)(0, 1)(2, 3)(4, 5)
can be written, and every intermediate step can be given a name.
Now that you have an idea on how curry
can be used,
let’s dive into its declaration and definition.
Declaration
Given the constraints listed earlier, we can easily write down the
declaration of curry
.
template <typename TF>
constexpr decltype(auto) curry(TF&& f);
Why
decltype(auto)
instead ofauto
?
Because the final step of curry
needs to return
exactly what the original function object does. Example:
auto f = [](auto, auto) -> auto&
{
return some_global_variable;
};
// OK - can return an additional "curry wrapper" by value.
auto step0 = curry(f);
// Same as above.
auto step1 = step0('a');
// Now `step1` has to return a reference!
auto& that_same_global = step1('b');
Additionally, the f
parameter is taken by forwarding-reference.
I will assume you’re familiar with move
semantics, std::forward
,
and “forward
captures” for the rest of the article.
Definition
I’ll show the complete definition of curry
first, and
then analyze all the parts one-by-one more closely.
template <typename TF>
constexpr decltype(auto) curry(TF&& f)
{
if constexpr (std::is_callable<TF&&()>{})
{
return FWD(f)();
}
else
{
return [xf = FWD_CAPTURE(f)](auto&&... partials) mutable constexpr
{
return curry
(
[
= FWD_CAPTURE_PACK_AS_TUPLE(partials),
partial_pack = std::move(xf)
yf ]
(auto&&... xs) constexpr
-> decltype(forward_like<TF>(xf.get())(FWD(partials)...,
(xs)...))
FWD{
return apply_fwd_capture([&yf](auto&&... ys) constexpr
-> decltype(forward_like<TF>(yf.get())(FWD(ys)...))
{
return forward_like<TF>(yf.get())(FWD(ys)...);
}, partial_pack, FWD_CAPTURE_PACK_AS_TUPLE(xs));
}
);
};
}
}
The first thing to notice is the recursive structure
of curry
.
template <typename TF>
constexpr decltype(auto) curry(TF&& f)
{
if constexpr (std::is_callable<TF&&()>{})
{
return FWD(f)();
}
else
{
// ...
}
}
The base case branch is taken when
std::is_callable<TF&&()>{}
evaluates to
true
. std::is_callable
is a new C++17 type trait that checks whether or not a
particular object types can be called with a specific set of argument
types.
If
std::is_callable<TF&&()>{}
evaluates tofalse
, then it means thatTF
needs some arguments in order to be called - those arguments can be curried/partially-applied.If it evaluates to
true
, it means that there are no more arguments to curry/partially-apply inf
. Therefore,f
can be invoked to get the final result:return FWD(f)();
FWD
is a macro that expands tostd::forward<decltype(f)>(f)
. It’s being used asTF
may have a ref-qualifiedoperator()
that behaves differently depending onf
’s value category.
We will now focus on the recursive case of
curry
. The first step is allowing partial
application of arguments - since we don’t know how many arguments
will be bound in advance, a generic variadic lambda will be
returned:
return [xf = FWD_CAPTURE(f)](auto&&... partials) mutable constexpr
{
return curry(/* ... */);
}
The returned lambda will:
Capture
f
by forward capture intoxf
.Accept any amount of forwarding references in the
partials...
pack. These arguments will be bound for subsequent calls.Be marked as
mutable
: this is important asxf
will be moved in the inner lambda’s capture list.Be marked as
constexpr
: this allowscurry
to be used as a constant expression where possible.- Note that, since C++17, lambdas
are implicity
constexpr
unless they fail to satisfy anyconstexpr
function requirement.
- Note that, since C++17, lambdas
are implicity
Recursively call
curry
in its body, returning a new curried/partially-applicable function.
Let’s now focus on the return curry(/*...*/)
statement.
We want to return a curried version of a new intermediate
function object where:
The
partials...
pack values are bound for its invocation - these values will be captured by forward capture aspartial_pack
.The forward-captured
xf
from the “parent” lambda is captured by move intoyf
.xf
doesn’t need to be forwarded asFWD_CAPTURE(f)
returns a movable wrapper that either stores an lvalue reference or a value.
return curry
(
[
= FWD_CAPTURE_PACK_AS_TUPLE(partials),
partial_pack = std::move(xf)
yf ]
(auto&&... xs) constexpr
-> decltype(forward_like<TF>(xf.get())(FWD(partials)...,
(xs)...))
FWD{
// ...
}
);
The lambda passed to curry
will accept any number of
forwarding references in the xs...
pack that will
be used alongside the captured partials...
to call
f
. The expected function call can be easily understood by
the lambda’s trailing return type:
// Unwrap `f` from the `xf` `FWD_CAPTURE` wrapper and propagate
// the original function object's value category.
// vvvvvvvvvvvvvvvvvvvvvvvvv
-> decltype(forward_like<TF>(xf.get())(FWD(partials)..., FWD(xs)...))
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Unpack both the `partials` and `xs` argument packs in a
// single function call to `f`.
forward_like
is an utility function in my vrm_core
library that
forwards the passed argument with the same value
category of the potentially-unrelated specified type. It basically
copies the “lvalue/rvalue-ness” of the user-provided template
parameter and applies it to its argument.
The expression inside the above return type essentially means:
“invoke the original function object by unpacking
partials...
and xs...
one after
another”.
Lastly, let’s analyze the body of the lambda.
return apply_fwd_capture([&yf](auto&&... ys) constexpr
-> decltype(forward_like<TF>(yf.get())(FWD(ys)...))
{
return forward_like<TF>(yf.get())(FWD(ys)...);
}, partial_pack, FWD_CAPTURE_PACK_AS_TUPLE(xs));
Remember: we’re trying to call f
by unpacking both
partials...
and xs...
at the same
time. The partials...
pack is stored in a special
wrapper returned by FWD_CAPTURE_PACK_AS_TUPLE
. The
xs...
pack contains the arguments passed to the lambda.
The apply_fwd_capture
takes any number of wrapped
forward-capture pack wrappers and uses them to invoke an
user-provided function object. The wrappers are unpacked at the same
time, preserving the original value category. Since xs...
is not wrapped, we’re going to explicitly do so by using the
FWD_CAPTURE_PACK_AS_TUPLE
macro.
In short, apply_fwd_capture
will invoke the
constexpr
variadic lambda by expanding
partials...
and xs...
correctly - those values
will be then forwarded to the wrapped callable object
yf
.
That’s it! Eventually the recursion will end as one
of the steps will produce an intermediate function objects that
satisfies std::is_callable<TF&&()>{}
, giving
back a “concrete” result to the caller.
Generated assembly benchmarks
As I did in my previous “passing
functions to functions” article, I will compare the number
of generated assembly lines for different code snippets where
curry
is used. The point of these “benchmarks” is giving
the readers an idea on how easy it is for the compiler to optimize
curry
out - they are in no way exhaustive or representative
of a real-world situation. (The benchmarks were generated with
this Python script, which also prints out the assembly.)
The compiler used for these measurements is g++ 7.0.0 20170113, compiled from the SVN repository.
constexpr
variables
When curry
is used in a constexpr
context
it is trivial to prove that it gets completely optimized out by the
compiler. Regardless, here’s the snippet that’s going to be
measured:
int main()
{
const auto sum = [](auto a, auto b, auto c, auto d, auto e, auto f, auto g,
auto h) constexpr
{
return a + b + c + d + e + f + g + h;
};
constexpr auto expected = sum(0, 1, 2, 3, 4, 5, 6, 7);
#if defined(VR_BASELINE)
constexpr auto s0 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s1 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s2 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s3 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s4 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s5 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s6 = sum(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s7 = sum(0, 1, 2, 3, 4, 5, 6, 7);
#elif defined(VR_CURRY)
constexpr auto s0 = curry(sum)(0, 1, 2, 3, 4, 5, 6, 7);
constexpr auto s1 = curry(sum)(0)(1, 2, 3, 4, 5, 6, 7);
constexpr auto s2 = curry(sum)(0, 1)(2, 3, 4, 5, 6, 7);
constexpr auto s3 = curry(sum)(0, 1, 2)(3, 4, 5, 6, 7);
constexpr auto s4 = curry(sum)(0, 1, 2, 3)(4, 5, 6, 7);
constexpr auto s5 = curry(sum)(0, 1, 2, 3, 4)(5, 6, 7);
constexpr auto s6 = curry(sum)(0, 1, 2, 3, 4, 5)(6, 7);
constexpr auto s7 = curry(sum)(0, 1, 2, 3, 4, 5, 6)(7);
#endif
static_assert(s0 == expected);
static_assert(s1 == expected);
static_assert(s2 == expected);
static_assert(s3 == expected);
static_assert(s4 == expected);
static_assert(s5 == expected);
static_assert(s6 == expected);
static_assert(s7 == expected);
return s0 + s1 + s2 + s3 + s4 + s5 + s6 + s7;
}
sum
is aconstexpr
generic lambda with an arity of \(8\).When measuring the baseline, the
s0
…s7
constexpr
variables are initialized by simply callingsum
.When using
curry
,s0
…s7
are initialized by using various invocations ofcurry(sum)
.In the end, the expected sum result is statically asserted and returned from
main
.
Baseline
O0 | O1 | O2 | O3 | Ofast |
---|---|---|---|---|
14 | 2 | 2 | 2 | 2 |
Curry
O0 | O1 | O2 | O3 | Ofast |
---|---|---|---|---|
14 (+0.0%) | 2 (+0.0%) | 2 (+0.0%) | 2 (+0.0%) | 2 (+0.0%) |
As shown by the tables above, using curry
introduces no
additional overhead when used in the initialization of
constexpr
variables.
You can find the complete snippet on GitHub.
volatile
variables
Let’s now measure the eventual overhead of curry
when
initializing volatile
variables. The snippet is almost
identical to the previous one, except for a few differences:
The
s0
…s7
variables are now marked asvolatile
instead ofconstexpr
.The
static_assert(x)
checks have been replaced withif(!x){ return -1; }
.
Baseline
O0 | O1 | O2 | O3 | Ofast |
---|---|---|---|---|
68 | 56 | 42 | 42 | 42 |
Curry
O0 | O1 | O2 | O3 | Ofast |
---|---|---|---|---|
68 (+0.0%) | 56 (+0.0%) | 42 (+0.0%) | 42 (+0.0%) | 42 (+0.0%) |
Even with volatile
, there isn’t any additional overhead
introduced by curry
!
You can find the complete snippet on GitHub.
Intermediate curry
steps
The above benchmarks never stored any intermediate curry
return value - the entire expression was part of the
s0
…s7
initializer expression. Let’s see what
happens when those intermediate steps are stored as follows:
auto i0 = curry(sum);
auto i1 = curry(sum)(0);
auto i2 = curry(sum)(0, 1);
auto i3 = curry(sum)(0, 1, 2);
auto i4 = curry(sum)(0, 1, 2, 3);
auto i5 = curry(sum)(0, 1, 2, 3, 4);
auto i6 = curry(sum)(0, 1, 2, 3, 4, 5);
auto i7 = curry(sum)(0, 1, 2, 3, 4, 5, 6);
volatile auto s0 = i0(0, 1, 2, 3, 4, 5, 6, 7);
volatile auto s1 = i1(1, 2, 3, 4, 5, 6, 7);
volatile auto s2 = i2(2, 3, 4, 5, 6, 7);
volatile auto s3 = i3(3, 4, 5, 6, 7);
volatile auto s4 = i4(4, 5, 6, 7);
volatile auto s5 = i5(5, 6, 7);
volatile auto s6 = i6(6, 7);
volatile auto s7 = i7(7);
Baseline
O0 | O1 | O2 | O3 | Ofast |
---|---|---|---|---|
68 | 56 | 42 | 42 | 42 |
Curry
O0 | O1 | O2 | O3 | Ofast |
---|---|---|---|---|
19141 (+2804%) | 56 (+0.0%) | 42 (+0.0%) | 42 (+0.0%) | 42 (+0.0%) |
From optimization level -O1
onwards everything is great:
zero overhead! When using -O0
, though,
there is a quite noticeable overhead of \(+2804\%\) extra generated assembly compared
to the baseline.
You can find the complete snippet on GitHub.
(Some additional benchmarks with volatile
lambda
parameters and values are available
on the GitHub repository.)
Compiler bugs
curry
looks great! Zero run-time overhead, partial application and currying all in one… what’s the catch?
Well, the hardest part is… getting curry
to compile. As
seen from these
tweets between me and Julian Becker, it seems that both
g++
and clang++
fail with internal
compiler errors for different reasons.
This snippet on gcc.godbolt.org produces a g++ internal compiler error. Commenting out the trailing return type on line 158 fixes the ICE. I reported a minimal version of this issue as bug #78006.
clang++’s frontend crashes in versions 3.9 and 4.0 with this wandbox snippet. I’ve reported this as bug #31435.
I managed to compile curry
and the snippets used for
this article by cloning the latest version of gcc from SVN and compiling
it on my machine - I assume that some of the crashes were fixed in on
trunk and gcc.godbolt.org is still a little bit
behind.
Acknowledgments
Many thanks to Julian Becker and Jackie Kay for proofreading the article and providing very valuable feedback.