How to Avoid Thread-Safety Cost for Functions' static Variables

Table of Contents

In this blog post, we’ll look at static variables defined in a function scope. We’ll see how they are implemented and how to use them. What’s more, we’ll discuss several cases where we can avoid extra thread-safety cost.

Let’s start.

Introduction

As you may know, C++ offers a way to define static variables in a function/block scope:

void foo() { 
    static int counter = 0;
    ++counter;
}

Above, the counter variable will be initialized and created when foo() is invoked for the first time. In other words, a static local variable is initialized lazily. The counter is kept “outside” the function’s stack space. This allows, for example, to keep the state, but limit the visibility of the global object.

Here’s the full example:

#include <iostream>

int foo() { 
    static int counter = 0;
    return ++counter;
}

int main() {
    foo();
    foo();
    foo();
    auto finalCounter = foo();
    std::cout << finalCounter;
}

Run @Compiler Explorer

If you run the program, you’ll get 4 as the output.

Static local variables, since C++11, are guaranteed to be initialized in a thread-safe way. The object will be initialized only once if multiple threads enter a function with such a variable. Have a look below:

#include <iostream>
#include <thread>

struct Value {
    Value(int x) : v(x) { std::cout << "Value(" << v << ")\n"; }
    ~Value() noexcept { std::cout << "~Value(" << v << ")\n"; }

    int v { 0 };
};

void foo() {
    static Value x { 10 };
}

int main() {
    std::jthread worker1 { foo };
    std::jthread worker2 { foo };
    std::jthread worker3 { foo };
}

Run @Compiler Explorer

The example creates three threads that call the foo() simple function.

However, on GCC, you can also try compiling with the following flags:

-std=c++20 -lpthread -fno-threadsafe-statics

And then the output might be as follows:

Value(Value(1010)
)
Value(10)
~Value(10)
~Value(10)
~Value(10)

Three static objects are created now!

On Windows, MSVC, you can play with /Zc:threadSafeInit and disable this behaviour. See /Zc:threadSafeInit (Thread-safe Local Static Initialization) | Microsoft Learn

Thread safety may be beneficial in most cases, for example, when you want to implement a singleton (Meyers Singleton, it defines the entity as a static variable in a function…).

How much does this cost?

To understand the cost of the static local variables, let’s consider this popular SQ question:

c++ - Is there a penalty for using static variables in C++11 - Stack Overflow

In short: not much :)

The answer contains the following benchmark setup:

Two versions tested:
1. Local static inside a function
2. Global static at namespace scope
Both return a const std::vector<int>& with {1, 2, 3}.
Measured over 500 million iterations, summing vector elements.
Timed using std::chrono with barriers to avoid reordering.

Results (first benchmark, indirect function calls): (warning the code is from 2014!)

Clang: local = 4618 ms, global = 4392 ms → local slower by ~0.45 ns per call.
GCC: local = 4181 ms, global = 4418 ms → local faster by ~0.47 ns per call.
Conclusion: variance is tiny, compiler-dependent.

Second benchmark (function objects for better inlining):

GCC: local = 3803 ms, global = 2323 ms → global faster by ~2.96 ns per call.
Clang: local = 4183 ms, global = 3253 ms → global faster by ~1.86 ns per call.
This matches the intuition that a global avoids the guard check entirely, while a local static still needs a fast-path check.

Takeaway:

Function-local statics in C++11+ are initialized exactly once, thread-safely.
After initialization, calls incur only a very small, predictable guard check (~1–3 ns in tight loops).
In most real-world code, the difference is negligible; choose the style that best fits your design.
If absolute lowest per-call overhead is needed in a very hot path, a namespace-scope constexpr/constinit global removes even that guard.

How to avoid the cost

We’re equipped with the basic knowledge, and now let’s consider how to limit the cost of thread-safety. Consider the following super-simple example:

#include <vector>
#include <algorithm> // std::find
#include <print>

bool is_blocked_id(int id) {
    static const std::vector<int> blocked_ids{
        101, 202, 303, 404, 505 // possibly a longer list...
    };

    return std::find(blocked_ids.begin(), blocked_ids.end(), id) != blocked_ids.end();
}

int main() {
    for (int id : {101, 150, 202, 999}) {
        std::println("{} is blocked: {}", id, is_blocked_id(id));
    }
}

Run @Compiler Explorer

The main function iterates through some ids and checks them through is_blocked_id function. The function then uses a static const vector to store “suspicious” numbers.

And here’s the generated code:

"is_blocked_id(int)":
        push    rbp
        mov     rbp, rsp
        push    r14
        push    r13
        push    r12
        push    rbx
        sub     rsp, 64
        mov     DWORD PTR [rbp-84], edi
        movzx   eax, BYTE PTR "guard variable for is_blocked_id(int)::blocked_ids"[rip]
        test    al, al
        sete    al
        test    al, al
        je      .L216
        mov     edi, OFFSET FLAT:"guard variable for is_blocked_id(int)::blocked_ids"
        call    "__cxa_guard_acquire"
        ...

As you can see, we have some extra calls to __cxa_guard_acquire and thus each time you call the function, we have some overhead.

Some issues

There are a few cases to consider.

While in Mayer’s singleton thread safety was essential, here we have a constant object, so maybe we can do something with this knowledge?
Since we use std::vector there’s extra memory allocation going on. Do we need it?

What to improve: In short, we need to move our dynamic initialization to constant initialization. Since our data is const, there’s no need to pay an extra price at runtime. The compiler can initialize all data at compile time and write it to the binary.

How to achieve this?

One way is to push the vector outside the function scope, that way we’ll lose the “locality” but the compiler will ensure it’s properly initialized before even the main function starts:

#include <vector>
#include <algorithm> // std::find
#include <print>

static const std::vector<int> blocked_ids{
        101, 202, 303, 404, 505
    };

bool is_blocked_id(int id) {
    return std::find(blocked_ids.begin(), blocked_ids.end(), id) != blocked_ids.end();
}

int main() {
    for (int id : {101, 150, 202, 999}) {
        std::println("{} is blocked: {}", id, is_blocked_id(id));
    }
}

Run @Compiler Explorer

Now the generated code is a bit better:

"is_blocked_id(int)":
        push    rbp
        mov     rbp, rsp
        push    rbx
        sub     rsp, 56
        mov     DWORD PTR [rbp-52], edi
        mov     edi, OFFSET FLAT:"blocked_ids"
        call    "std::vector<int, std::allocator<int> >::end() const"
        mov     QWORD PTR [rbp-48], rax
        mov     edi, OFFSET FLAT:"blocked_ids"
        call    "std::vector<int, std::allocator<int> >::end() const"
        mov     rbx, rax
        mov     edi, OFFSET FLAT:"blocked_ids"
        call    "std::vector<int, std::allocator<int> >::begin() const"
        mov     rcx, rax
        lea     rax, [rbp-52]
        ...

As you can see, there are no guards needed, just the vector handling. The runtime memory allocation is still there, but occurs before the main function enters.

But we can do better!

Better than a vector

As we discussed, there’s no need for dynamic allocation here. So, how about using just the std::array?

static const std::array blocked_ids{
        101, 202, 303, 404, 505
    };

bool is_blocked_id(int id) {
    return std::find(blocked_ids.begin(), blocked_ids.end(), id) != blocked_ids.end();
}

Run @Compiler Explorer

The generated code:

"is_blocked_id(int)":
        push    rbp
        mov     rbp, rsp
        push    rbx
        sub     rsp, 24
        mov     DWORD PTR [rbp-20], edi
        mov     edi, OFFSET FLAT:"blocked_ids"
        call    "std::array<int, 5ul>::end() const"
        mov     rbx, rax
        mov     edi, OFFSET FLAT:"blocked_ids"
        call    "std::array<int, 5ul>::begin() const"
        mov     rcx, rax
        lea     rax, [rbp-20]
        mov     rdx, rax
        mov     rsi, rbx
        mov     rdi, rcx
        call    "int const* std::find<int const*, int>(int const*, int const*, int const&)"
        mov     rbx, rax

Even better approach

If you still want to keep numbers in a function scope, there’s still a chance to do it, just ensure the collection is initialized at compile time:

bool is_blocked_id(int id) {
    static const std::array blocked_ids{ // or constexpr, or constinit!
        101, 202, 303, 404, 505
    };

    return std::find(blocked_ids.begin(), blocked_ids.end(), id) != blocked_ids.end();
}

The static const should be good enough for a compiler to use compile-time initialization, assuming the container is implemented in a constexpr way. But to be sure, use constinit or constexpr:

bool is_blocked_id(int id) {
    static constinit std::array blocked_ids{
        101, 202, 303, 404, 505
    };

    return std::find(blocked_ids.begin(), blocked_ids.end(), id) != blocked_ids.end();
}

Here’s the article about those keywords if you want to know more: const vs constexpr vs consteval vs constinit in C++20 - C++ Stories.

More complex types

In a real-life application, you will work not only with arrays of simple integers. Maps of strings, maps of custom data types, and much more. In that case, moving things to compile-time might not be straightforward. Yet, there are steps you can try.

Here are some examples, just to get started:

struct Point {
    int x {0};
    int y {0};
    bool operator==(const Point& pt) const {
        return pt.x == x && pt.y == y;
    }
};

bool is_blocked_point(int x, int y) {
    static constinit std::array<Point,6> blocked_pts{
        Point{101, 101}, Point{202, 202}, Point{303, 303}, 
        Point{404, 404}, Point{505, 505}, Point{606, 606}
    };

    return std::find(blocked_pts.begin(), blocked_pts.end(), {x, y}) != blocked_pts.end();
}

Run @Compiler Explorer

The point class is constexpr implicitly, and the compiler can use it at compile-time.

Or wth strings:

bool is_blocked_name(std::string_view name) {
    static constinit std::array<std::string_view, 6> blocked_names{
        "alice", "bob", "charlie", "dora", "eve", "mallory"
    };
    return std::find(blocked_names.begin(), blocked_names.end(), name) != blocked_names.end();
}

Run @Compiler Explorer

Summary

In the article, we went from a simple example that illustrates how a static variable inside a function works, to advanced scenarios where we want to limit the cost of thread-safety.

Always measure, measure, measure. In most cases, the performance hit added for a local static variable will be super small, but in your hot path, it’s best to check. Still, if you can move something to the compile time, and especially avoid dynamic runtime memory allocations, then it’s a clear win.

Back to you

Do you use static variables on a function scope?
Have you got any issues or bugs with such variables?

How to Avoid Thread-Safety Cost for Functions' static Variables

Introduction

How much does this cost?

How to avoid the cost

Some issues

Better than a vector

Even better approach

More complex types

Summary

Back to you

Similar Articles: