Understanding std::shared_mutex from C++17

Table of Contents

In this article, we’ll start with a basic example using std::mutex, look at its limitations, and then introduce std::shared_mutex, a reader-writer mutex added in C++17. Even in 2026, with many new concurrency features available, std::shared_mutex is still a valuable and practical tool.

Let’s jump in.

A Simple Thread-Safe Counter with `std::mutex`

We’ll begin with a small example (a standard “hello world” for this type of mutexes): a counter object that multiple threads can access:

#include <mutex>

class Counter {
public:
    int get() const {
        std::lock_guard<std::mutex> lock(mutex_);
        return value_;
    }

    void increment() {
        std::lock_guard<std::mutex> lock(mutex_);
        ++value_;
    }
private:
    mutable std::mutex mutex_;
    int value_{0};
};

Nothing scary so far, and this implementation is correct, thread-safe, and easy to understand.

However, there’s a significant limitation: All access is exclusive.

Only one thread can call get() or increment() at any given time, even though get() does not modify the state.

When `std::mutex` Becomes a Bottleneck

In many real-world programs, shared data is read frequently and might be rarely updated/overriden.

For example:

configuration data
caches
lookup tables
statistics and metrics

With std::mutex, even multiple read-only operations block each other. This limits scalability and wastes available parallelism.

Have a look at the following example:

int main() {
    Counter counter;
    std::vector<std::jthread> threads;

    threads.emplace_back([&counter] {
        for (int i = 0; i < 10; ++i) {
            counter.increment();
            std::this_thread::sleep_for(15ms);
        }
    });

    for (int i = 0; i < 4; ++i) {
        threads.emplace_back([&counter, i] {
            for (int j = 0; j < 10; ++j) {
                std::println("reader {} sees {}", i, counter.get());
                std::this_thread::sleep_for(10ms);
            }
        });
    }
}

On Compiler Explorer - see here https://godbolt.org/z/cdjrj8hGd - I’m getting the following output:

Program returned: 0
reader 0 sees 0
reader 1 sees 1
reader 2 sees 1
reader 3 sees 1
reader 0 sees 1
reader 1 sees 1
reader 2 sees 1
reader 3 sees 1
reader 2 sees 2
reader 3 sees 2
reader 0 sees 2
reader 1 sees 2
reader 0 sees 3
reader 1 sees 3
reader 2 sees 3
reader 3 sees 3
reader 3 sees 3
reader 0 sees 3
reader 1 sees 3
reader 2 sees 3
reader 3 sees 4
reader 1 sees 4
reader 2 sees 4
reader 0 sees 4
reader 3 sees 5
reader 1 sees 5
reader 2 sees 5
reader 0 sees 5
reader 1 sees 5
reader 2 sees 5
reader 3 sees 5
reader 0 sees 5
reader 2 sees 6
reader 3 sees 6
reader 0 sees 6
reader 1 sees 6
reader 3 sees 7
reader 0 sees 7
reader 2 sees 7
reader 1 sees 7

Notice that some lines repeat the same value. That’s expected. The writer increments the counter every ~15 ms, but readers sample it every ~10 ms, so they can observe the same value more than once. Also, thread scheduling is not deterministic, so each reader may see a slightly different sequence.

In the demo code, there are many more reads than writes, so is the mutex the best solution here?

Introducing `std::shared_mutex`

std::shared_mutex is a reader-writer mutex. It supports two locking modes:

shared ownership - many threads can hold the lock simultaneously
exclusive ownership - only one thread can hold the lock

This makes std::shared_mutex a good fit for read-mostly data structures.

Refactoring the Counter with `std::shared_mutex`

Let’s update the counter example:

#include <shared_mutex>
#include <mutex> // for locks

class Counter {
public:
    int get() const {
        std::shared_lock lock(mutex_);
        return value_;
    }

    void increment() {
        std::unique_lock lock(mutex_);
        ++value_;
    }

private:
    mutable std::shared_mutex mutex_;
    int value_{0};
};

What changed?

get() now uses std::shared_lock
increment() still uses exclusive access via std::unique_lock
the mutex type is std::shared_mutex

With this change, many threads can call get() concurrently, writes remain fully protected, and read scalability improves with almost no extra complexity/

On Compiler Explorer - https://godbolt.org/z/W1es4qM8x - I’m getting:

reader 0 sees 0
reader 1 sees 0
reader 2 sees 0
reader 3 sees 1
reader 1 sees 1
reader 0 sees 1
reader 2 sees 1
reader 3 sees 1
reader 1 sees 2
reader 0 sees 2
reader 2 sees 2
reader 3 sees 2
reader 3 sees 2
reader 1 sees 2
reader 0 sees 2
reader 2 sees 2
reader 1 sees 3
reader 2 sees 3
reader 3 sees 3
reader 0 sees 3
reader 2 sees 4
reader 1 sees 4
reader 3 sees 4
reader 0 sees 4
reader 2 sees 4
reader 1 sees 5
reader 3 sees 5
reader 0 sees 5
reader 3 sees 5
reader 1 sees 5
reader 0 sees 5
reader 2 sees 5
reader 3 sees 6
reader 0 sees 6
reader 2 sees 6
reader 1 sees 6
reader 3 sees 6
reader 0 sees 7
reader 2 sees 7
reader 1 sees 7

The output is still nondeterministic, and it may look similar between std::mutex and std::shared_mutex. The key difference is not the values printed, but the fact that with std::shared_mutex multiple readers can hold the lock at the same time. This matters when the protected read section is non-trivial (parsing, lookups, copying data, etc.) and when the program is under real contention.

Measuring gains

Ok, but let’s try to measure the gains to see the difference. I’ll add some extra sleep code to simulate a real “workload”, and then we can compare times.

#include <chrono>
#include <mutex>
#include <print>
#include <shared_mutex>
#include <thread>
#include <vector>

using namespace std::chrono_literals;

// Simulated work while holding the lock
constexpr auto ReadWork  = 2ms;
constexpr auto WriteWork = 1ms;
constexpr auto GapWork   = 1ms;

// --- Version 1: std::mutex (exclusive for both read and write) ---
class CounterMutex {
public:
    int get() const {
        std::lock_guard<std::mutex> lock(mutex_);
        std::this_thread::sleep_for(ReadWork);
        return value_;
    }

    void increment() {
        std::lock_guard<std::mutex> lock(mutex_);
        ++value_;
        std::this_thread::sleep_for(WriteWork);
    }

private:
    mutable std::mutex mutex_;
    int value_{0};
};

// --- Version 2: std::shared_mutex (shared reads, exclusive writes) ---
class CounterSharedMutex {
public:
    int get() const {
        std::shared_lock lock(mutex_);
        std::this_thread::sleep_for(ReadWork);
        return value_;
    }

    void increment() {
        std::unique_lock lock(mutex_);
        ++value_;
        std::this_thread::sleep_for(WriteWork);
    }

private:
    mutable std::shared_mutex mutex_;
    int value_{0};
};

void run_test(const char* label, auto& counter) {
    constexpr int kReaders = 4;
    constexpr int kReadsPerReader = 30;
    constexpr int kWrites = 20;

    const auto start = std::chrono::steady_clock::now();

    {
        std::vector<std::jthread> threads;
        threads.reserve(1 + kReaders);

        // Writer
        threads.emplace_back([&] {
            for (int i = 0; i < kWrites; ++i) {
                counter.increment();
                std::this_thread::sleep_for(GapWork);
            }
        });

        // Readers
        for (int id = 0; id < kReaders; ++id) {
            threads.emplace_back([&counter] {
                for (int i = 0; i < kReadsPerReader; ++i) {
                    (void)counter.get();
                    std::this_thread::sleep_for(GapWork);
                }
            });
        }
    }

    const auto end = std::chrono::steady_clock::now();
    const auto ms =
        std::chrono::duration_cast<std::chrono::milliseconds>(end - start);

    std::println("{}: {} ms", label, ms.count());
}

int main() {
    std::println("hardware_concurrency: {}", std::thread::hardware_concurrency());

    CounterMutex counter_mutex;
    CounterSharedMutex counter_shared;

    run_test("std::mutex", counter_mutex);
    run_test("std::shared_mutex", counter_shared);
}

See at Compiler Explorer. The output:

hardware_concurrency: 2
std::mutex: 285 ms
std::shared_mutex: 102 ms

On this machine, std::thread::hardware_concurrency() reports 2 hardware threads. In this setup, the workload is intentionally read-heavy and each read holds the lock for a short but non-trivial amount of time. With std::mutex, all reads and writes use exclusive locking, so reader threads are effectively serialized and must wait for each other. As a result, the total runtime is about 285 ms.

With std::shared_mutex, read operations acquire a shared lock and can therefore run concurrently. On a system with two hardware threads, this allows two readers to make progress at the same time, significantly reducing contention. Under the same workload and timing parameters, the total runtime drops to about 102 ms. The exact numbers will vary between runs and platforms. Still, the difference clearly illustrates the main advantage of std::shared_mutex for read-mostly workloads: improved throughput when multiple readers can proceed in parallel.

A More Realistic Example: Read-Mostly Cache

The “counter” example is small, but the pattern becomes more useful with real data structures. A typical example is a cache.

#include <shared_mutex>
#include <unordered_map>
#include <string>

class Cache {
public:
    std::string get(const std::string& key) const {
        std::shared_lock lock(mutex_);
        if (auto it = data_.find(key); it != data_.end())
            return it->second;
        return {};
    }

    void put(std::string key, std::string value) {
        std::unique_lock lock(mutex_);
        data_[std::move(key)] = std::move(value);
    }

private:
    mutable std::shared_mutex mutex_;
    std::unordered_map<std::string, std::string> data_;
};

This pattern appears often in real systems:

many threads read cached values
updates are relatively rare
correctness is more important than extreme micro-optimizations

Using std::shared_mutex allows reads to scale while still keeping writes safe.

Common Pitfalls

Recursive locking is undefined

A thread must not lock the same std::shared_mutex recursively:

mutex.lock();
mutex.lock(); // undefined behavior

This applies to both shared and exclusive ownership.

You cannot upgrade a shared lock

This pattern will deadlock:

std::shared_lock read_lock(mutex_);
std::unique_lock write_lock(mutex_); // bad idea

If upgrading is required, the locking strategy usually needs to change.

More locks do not always mean more performance

std::shared_mutex has overhead. If:

writes are frequent
contention is low
critical sections are small

then std::mutex may be just as fast or even faster. Measuring is important.

You can read this recent article about some detailed measurements: When std::shared_mutex Outperforms std::mutex: A Google Benchmark Study – Tech For Talk

Newer Concurrency Tools in C++20 and Above

Since C++17, the concurrency library has expanded significantly. We now have:

std::jthread and cooperative cancellation (C++20)
semaphores, latches, and barriers (C++20)
improved atomic operations (C++20)
safe memory reclamation mechanisms (RCU, hazard pointers in C++26)

These tools focus mostly on thread lifetime, coordination, cancellation or lock-free programming.

std::shared_mutex fills a different role.

It is still a mutual-exclusion primitive, explicitly designed to protect a shared state with many readers and few writers. It does not compete with atomics, condition variables, or RCU.

Summary

In this article, we explored std::shared_mutex, a synchronization primitive designed for protecting read-mostly shared data. Unlike std::mutex, it allows multiple threads to read the same resource concurrently, while still ensuring exclusive access for writers.

Starting from a simple counter example, we showed how a plain mutex serializes all access and how switching to std::shared_mutex enables concurrent reads with only small changes to the code. A simple benchmark experiment demonstrated that, under a read-heavy workload, this can significantly reduce contention and improve throughput.

Even with newer concurrency features added in C++20 and later, std::shared_mutex remains a useful and practical tool when reads dominate writes and simplicity is preferred over more complex synchronization techniques.

References

C++ Concurrency in Action 2nd Edition - by Anthony Williams
Programming: Principles and Practice Using C++ 3rd Edition - by Bjarne Stroustrup
Concurrency with Modern C++: What every professional C++ programmer should know about concurrency - by Rainer Grimm
<shared_mutex> | Microsoft Learn
std::shared_mutex - cppreference.com

Back to you

Have you tried shared_mutex? In what situations?
Do you use concurrency tools from C++20 and above?

Share your comments below