Last Update:
Understanding std::shared_mutex from C++17
Table of Contents
In this article, we’ll start with a basic example using std::mutex, look at its limitations, and then introduce std::shared_mutex, a reader-writer mutex added in C++17. Even in 2026, with many new concurrency features available, std::shared_mutex is still a valuable and practical tool.
Let’s jump in.
A Simple Thread-Safe Counter with std::mutex
We’ll begin with a small example (a standard “hello world” for this type of mutexes): a counter object that multiple threads can access:
#include <mutex>
class Counter {
public:
int get() const {
std::lock_guard<std::mutex> lock(mutex_);
return value_;
}
void increment() {
std::lock_guard<std::mutex> lock(mutex_);
++value_;
}
private:
mutable std::mutex mutex_;
int value_{0};
};
Nothing scary so far, and this implementation is correct, thread-safe, and easy to understand.
However, there’s a significant limitation: All access is exclusive.
Only one thread can call get() or increment() at any given time, even though get() does not modify the state.
When std::mutex Becomes a Bottleneck
In many real-world programs, shared data is read frequently and might be rarely updated/overriden.
For example:
- configuration data
- caches
- lookup tables
- statistics and metrics
With std::mutex, even multiple read-only operations block each other. This limits scalability and wastes available parallelism.
Have a look at the following example:
int main() {
Counter counter;
std::vector<std::jthread> threads;
threads.emplace_back([&counter] {
for (int i = 0; i < 10; ++i) {
counter.increment();
std::this_thread::sleep_for(15ms);
}
});
for (int i = 0; i < 4; ++i) {
threads.emplace_back([&counter, i] {
for (int j = 0; j < 10; ++j) {
std::println("reader {} sees {}", i, counter.get());
std::this_thread::sleep_for(10ms);
}
});
}
}
On Compiler Explorer - see here https://godbolt.org/z/cdjrj8hGd - I’m getting the following output:
Program returned: 0
reader 0 sees 0
reader 1 sees 1
reader 2 sees 1
reader 3 sees 1
reader 0 sees 1
reader 1 sees 1
reader 2 sees 1
reader 3 sees 1
reader 2 sees 2
reader 3 sees 2
reader 0 sees 2
reader 1 sees 2
reader 0 sees 3
reader 1 sees 3
reader 2 sees 3
reader 3 sees 3
reader 3 sees 3
reader 0 sees 3
reader 1 sees 3
reader 2 sees 3
reader 3 sees 4
reader 1 sees 4
reader 2 sees 4
reader 0 sees 4
reader 3 sees 5
reader 1 sees 5
reader 2 sees 5
reader 0 sees 5
reader 1 sees 5
reader 2 sees 5
reader 3 sees 5
reader 0 sees 5
reader 2 sees 6
reader 3 sees 6
reader 0 sees 6
reader 1 sees 6
reader 3 sees 7
reader 0 sees 7
reader 2 sees 7
reader 1 sees 7
Notice that some lines repeat the same value. That’s expected. The writer increments the counter every ~15 ms, but readers sample it every ~10 ms, so they can observe the same value more than once. Also, thread scheduling is not deterministic, so each reader may see a slightly different sequence.
In the demo code, there are many more reads than writes, so is the mutex the best solution here?
Introducing std::shared_mutex
std::shared_mutex is a reader-writer mutex. It supports two locking modes:
- shared ownership - many threads can hold the lock simultaneously
- exclusive ownership - only one thread can hold the lock
This makes std::shared_mutex a good fit for read-mostly data structures.
Refactoring the Counter with std::shared_mutex
Let’s update the counter example:
#include <shared_mutex>
#include <mutex> // for locks
class Counter {
public:
int get() const {
std::shared_lock lock(mutex_);
return value_;
}
void increment() {
std::unique_lock lock(mutex_);
++value_;
}
private:
mutable std::shared_mutex mutex_;
int value_{0};
};
What changed?
get()now usesstd::shared_lockincrement()still uses exclusive access viastd::unique_lock- the mutex type is
std::shared_mutex
With this change, many threads can call get() concurrently, writes remain fully protected, and read scalability improves with almost no extra complexity/
On Compiler Explorer - https://godbolt.org/z/W1es4qM8x - I’m getting:
reader 0 sees 0
reader 1 sees 0
reader 2 sees 0
reader 3 sees 1
reader 1 sees 1
reader 0 sees 1
reader 2 sees 1
reader 3 sees 1
reader 1 sees 2
reader 0 sees 2
reader 2 sees 2
reader 3 sees 2
reader 3 sees 2
reader 1 sees 2
reader 0 sees 2
reader 2 sees 2
reader 1 sees 3
reader 2 sees 3
reader 3 sees 3
reader 0 sees 3
reader 2 sees 4
reader 1 sees 4
reader 3 sees 4
reader 0 sees 4
reader 2 sees 4
reader 1 sees 5
reader 3 sees 5
reader 0 sees 5
reader 3 sees 5
reader 1 sees 5
reader 0 sees 5
reader 2 sees 5
reader 3 sees 6
reader 0 sees 6
reader 2 sees 6
reader 1 sees 6
reader 3 sees 6
reader 0 sees 7
reader 2 sees 7
reader 1 sees 7
The output is still nondeterministic, and it may look similar between std::mutex and std::shared_mutex. The key difference is not the values printed, but the fact that with std::shared_mutex multiple readers can hold the lock at the same time. This matters when the protected read section is non-trivial (parsing, lookups, copying data, etc.) and when the program is under real contention.
Measuring gains
Ok, but let’s try to measure the gains to see the difference. I’ll add some extra sleep code to simulate a real “workload”, and then we can compare times.
#include <chrono>
#include <mutex>
#include <print>
#include <shared_mutex>
#include <thread>
#include <vector>
using namespace std::chrono_literals;
// Simulated work while holding the lock
constexpr auto ReadWork = 2ms;
constexpr auto WriteWork = 1ms;
constexpr auto GapWork = 1ms;
// --- Version 1: std::mutex (exclusive for both read and write) ---
class CounterMutex {
public:
int get() const {
std::lock_guard<std::mutex> lock(mutex_);
std::this_thread::sleep_for(ReadWork);
return value_;
}
void increment() {
std::lock_guard<std::mutex> lock(mutex_);
++value_;
std::this_thread::sleep_for(WriteWork);
}
private:
mutable std::mutex mutex_;
int value_{0};
};
// --- Version 2: std::shared_mutex (shared reads, exclusive writes) ---
class CounterSharedMutex {
public:
int get() const {
std::shared_lock lock(mutex_);
std::this_thread::sleep_for(ReadWork);
return value_;
}
void increment() {
std::unique_lock lock(mutex_);
++value_;
std::this_thread::sleep_for(WriteWork);
}
private:
mutable std::shared_mutex mutex_;
int value_{0};
};
void run_test(const char* label, auto& counter) {
constexpr int kReaders = 4;
constexpr int kReadsPerReader = 30;
constexpr int kWrites = 20;
const auto start = std::chrono::steady_clock::now();
{
std::vector<std::jthread> threads;
threads.reserve(1 + kReaders);
// Writer
threads.emplace_back([&] {
for (int i = 0; i < kWrites; ++i) {
counter.increment();
std::this_thread::sleep_for(GapWork);
}
});
// Readers
for (int id = 0; id < kReaders; ++id) {
threads.emplace_back([&counter] {
for (int i = 0; i < kReadsPerReader; ++i) {
(void)counter.get();
std::this_thread::sleep_for(GapWork);
}
});
}
}
const auto end = std::chrono::steady_clock::now();
const auto ms =
std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::println("{}: {} ms", label, ms.count());
}
int main() {
std::println("hardware_concurrency: {}", std::thread::hardware_concurrency());
CounterMutex counter_mutex;
CounterSharedMutex counter_shared;
run_test("std::mutex", counter_mutex);
run_test("std::shared_mutex", counter_shared);
}
See at Compiler Explorer. The output:
hardware_concurrency: 2
std::mutex: 285 ms
std::shared_mutex: 102 ms
On this machine, std::thread::hardware_concurrency() reports 2 hardware threads. In this setup, the workload is intentionally read-heavy and each read holds the lock for a short but non-trivial amount of time. With std::mutex, all reads and writes use exclusive locking, so reader threads are effectively serialized and must wait for each other. As a result, the total runtime is about 285 ms.
With std::shared_mutex, read operations acquire a shared lock and can therefore run concurrently. On a system with two hardware threads, this allows two readers to make progress at the same time, significantly reducing contention. Under the same workload and timing parameters, the total runtime drops to about 102 ms. The exact numbers will vary between runs and platforms. Still, the difference clearly illustrates the main advantage of std::shared_mutex for read-mostly workloads: improved throughput when multiple readers can proceed in parallel.
A More Realistic Example: Read-Mostly Cache
The “counter” example is small, but the pattern becomes more useful with real data structures. A typical example is a cache.
#include <shared_mutex>
#include <unordered_map>
#include <string>
class Cache {
public:
std::string get(const std::string& key) const {
std::shared_lock lock(mutex_);
if (auto it = data_.find(key); it != data_.end())
return it->second;
return {};
}
void put(std::string key, std::string value) {
std::unique_lock lock(mutex_);
data_[std::move(key)] = std::move(value);
}
private:
mutable std::shared_mutex mutex_;
std::unordered_map<std::string, std::string> data_;
};
This pattern appears often in real systems:
- many threads read cached values
- updates are relatively rare
- correctness is more important than extreme micro-optimizations
Using std::shared_mutex allows reads to scale while still keeping writes safe.
Common Pitfalls
Recursive locking is undefined
A thread must not lock the same std::shared_mutex recursively:
mutex.lock();
mutex.lock(); // undefined behavior
This applies to both shared and exclusive ownership.
You cannot upgrade a shared lock
This pattern will deadlock:
std::shared_lock read_lock(mutex_);
std::unique_lock write_lock(mutex_); // bad idea
If upgrading is required, the locking strategy usually needs to change.
More locks do not always mean more performance
std::shared_mutex has overhead. If:
- writes are frequent
- contention is low
- critical sections are small
then std::mutex may be just as fast or even faster. Measuring is important.
You can read this recent article about some detailed measurements: When std::shared_mutex Outperforms std::mutex: A Google Benchmark Study – Tech For Talk
Newer Concurrency Tools in C++20 and Above
Since C++17, the concurrency library has expanded significantly. We now have:
std::jthreadand cooperative cancellation (C++20)- semaphores, latches, and barriers (C++20)
- improved atomic operations (C++20)
- safe memory reclamation mechanisms (RCU, hazard pointers in C++26)
These tools focus mostly on thread lifetime, coordination, cancellation or lock-free programming.
std::shared_mutex fills a different role.
It is still a mutual-exclusion primitive, explicitly designed to protect a shared state with many readers and few writers. It does not compete with atomics, condition variables, or RCU.
Summary
In this article, we explored std::shared_mutex, a synchronization primitive designed for protecting read-mostly shared data. Unlike std::mutex, it allows multiple threads to read the same resource concurrently, while still ensuring exclusive access for writers.
Starting from a simple counter example, we showed how a plain mutex serializes all access and how switching to std::shared_mutex enables concurrent reads with only small changes to the code. A simple benchmark experiment demonstrated that, under a read-heavy workload, this can significantly reduce contention and improve throughput.
Even with newer concurrency features added in C++20 and later, std::shared_mutex remains a useful and practical tool when reads dominate writes and simplicity is preferred over more complex synchronization techniques.
References
- C++ Concurrency in Action 2nd Edition - by Anthony Williams
- Programming: Principles and Practice Using C++ 3rd Edition - by Bjarne Stroustrup
- Concurrency with Modern C++: What every professional C++ programmer should know about concurrency - by Rainer Grimm
- <shared_mutex> | Microsoft Learn
- std::shared_mutex - cppreference.com
Back to you
- Have you tried
shared_mutex? In what situations? - Do you use concurrency tools from C++20 and above?
Share your comments below
I've prepared a valuable bonus for you!
Learn all major features of recent C++ Standards on my Reference Cards!
Check it out here:
