By now it should be (hopefully) widely known that the term “zero-cost abstractions” is a misnomer. Although, to be clear, it’s more of an unfortunate name. If it sounded like “abstractions that may not result in overhead after optimization”, it would be much more honest, but apparently such a name didn’t catch on….
Most C++ developers realize that “zero-cost abstractions” do indeed create no overhead in rantime, but only when optimizations are enabled. However, they also slow down compilation. Nevertheless, many people believe that the advantages of such abstractions outweigh their disadvantages, even at the cost of degraded debugging performance and increased compilation time.
I used to think this way too.
However, over the last few years I have come to realize how important fast debug builds and high compile speed are in some areas. One such area is game development. Game developers often criticize C++ abstractions for being unusable, and justifiably so: games are real-time interactive simulations, and even in debug mode they must remain playable and responsive. Imagine trying to debug a VR game at 20 FPS – it’s bound to make you queasy.
In this article, we’ll look at how the C++ abstraction model critically depends on compiler optimizations, analyze several examples of unexpected performance degradation, compare the three major compilers (GCC, Clang and MSVC) and discuss possible improvements and workarounds.
Why can moving int be slow?
At the ACCU 2022 conference I presented a paper “Moving an int Is Slow: Debug Performance Matters!” with a provocative title. How is this possible? Let’s consider the following code:
include
int main()
{
return std::move(0);
}
C++ developers know that std::move(0) is equivalent to static_cast(0), and expect the compiler to simply ignore this call. However, the GCC 12.2, Clang 14.0, and MSVC v19.x compilers all generate the call instruction!
At first glance this doesn’t seem like a serious problem, but if such a call ends up inside a high-performance algorithm, the problem becomes real. Here is an example from libcxx:
template
inline constexpr T accumulate(InputIterator first, InputIterator last, T init)
{
for (; first != last; ++first)
if _LIBCPP_STD_VER > 17
init = std::move(init) + *first;
else
init = init + *first;
endif
return init;
}
In C++17 and above, adding std::move to std::accumulate resulted in a significant degradation of debugging performance, because now an extra function call is made at each iteration of the loop.
The problem is deeper than it seems
std::move is only the tip of the iceberg. Any function that is essentially a type conversion also makes unnecessary calls in debug mode. For example: std::addressof, std::forward, std::move_if_noexcept, std::as_const, std::to_underlying.
Moreover, standard iterators such as std::vector::iterator also add unnecessary overhead. In debug mode, operator* and operator++ can be function calls, which slows down the container overhead considerably.
Consequences
Because of these problems:
- Game developers avoid “zero-cost abstractions” and replace them with type ghosts or macros.
- The std::vector is replaced by T*, and traversal is done via data().
- Functions from and are ignored.
- Safe alternatives to C-types, such as std::byte, are not used.
As a result, game developers find the standard abstractions too expensive, and the rest of the C++ world looks at them as “cavemen”.
Using optimizations in debug mode
Some might say, “Just turn on -Og!”. But:
- -Og is only available in GCC.
- In Clang, -Og is equivalent to -O1, which is not always convenient.
- MSVC has no -Og counterpart at all.
- Even -Og can overly aggressively inline code, worsening debugging.
- Possible solutions. Changes in the language. It would be useful to introduce more flexible mechanisms for “hygienic macros” or attributes, such as [[always_inline]] for certain functions.
- Compiler improvements.
- GCC 12.x introduced the -ffold-simple-inlines flag.
- Clang 15.x also introduced similar improvements.
- MSVC is still lagging behind, but work is in progress.
Standard library optimization. Some templates can be replaced by static_cast, and wrapper functions can be marked [[gnu::always_inline]].
Conclusion
The problem of poor debugging build performance in C++ is a serious obstacle, especially for the gaming industry. Improving the situation will require changes at the level of the language, compilers and standard libraries. I hope this article will inspire you to further research and discussions in this direction.