What is SSO  

Just briefly, SSO stands for Short String Optimization. It’s usually implemented as a small buffer (an array or something similar) occurring in the same storage as the string object. When the string is short, this buffer is used instead of a separate dynamic memory allocation.

See a simplified diagram below:

SSO idea

The diagram illustrates two strings and where they “land” in the string object. If the string is long (longer than N characters), it needs a costly dynamic memory allocation, and the address to that new buffer will be stored in ptr. On the other hand, if the string is short, we can put it inside the object in the buf[N]. Usually, buf and ptr might be implemented as union to save space, as we use one or the other, but not both simultaneously.

Let’s start with a basic test and see what’s the stack size of std::string using sizeof():

int main() {
    return sizeof(std::string);
}

Run at Compiler Explorer

GCC and MSVC show 32, while the libc++ implementation for Clang returns 24!

And now, it’s time to check the length of that short string; how can we check it? We have several options:

  • at runtime
  • constexpr since C++20
  • constinit since C++20
  • just checking for std::string{}.capacity();
  • and we can always look into real implementation and check the code :)

Let’s start with the first obvious option:

Checking length by using capacity()  

As pointed out in comments at reddit (thanks VinnieFalco) - you can check the size of the SSO via capacity() of the empty string:

#include <string>

int main() {
    constexpr auto ssoLen = std::string{}.capacity();
    static_assert(ssoLen >= 15);
    return static_cast<int>(ssoLen);
}

Run @Compiler Explorer

  • GCC and MSVC shows Program returned: 15
  • Clang prints Program returned: 23

Let’s have a look at some other experiments.

Checking length at runtime  

To check the length of the small buffer, we can write a new() handler and simply watch when new is used when creating a string object:

#include <string>
#include <iostream>

void* operator new(std::size_t size) {
	auto ptr = malloc(size);
	if (!ptr)
		throw std::bad_alloc{};
	std::cout << "new: " << size << ", ptr: " << ptr << '\n';
	return ptr;
}

// operator delete...

int main() {
    std::string x { "123456789012345"}; // 15 characters + null
    std::cout << x << '\n';
}

Here’s the code @Compiler Explorer

When you run the application, you’ll see that only the string is printed to the output.

But if you change the string to:

  std::string x { "1234567890123456"}; // 16 characters + null

GCC reports:

new: 17, ptr: 0x8b82b0
1234567890123456

Similarly, MSVC (running local MSVC release, as it doesn’t work under Compiler Explorer)

new: 32, ptr: 000001CD37720B00
1234567890123456

Clang is still “silent,”… but let’s change the string to:

std::string x { "12345678901234567890123"}; // 23 characters + null

Now, the libc++ implementation requests some dynamic memory. (Here’s a good overview of how it’s achieved: libc++’s implementation of std::string | Joel Laity)

In summary

  • GCC and MSVC can hold 15 characters (assuming char type, not wchar_t),
  • The Clang implementation (-stdlib=libc++) can store 23 characters! It’s very impressive, as the size of the whole string is only 24 bytes!

That was a simple and “classic” experiment… but in C++20, we can also check it at compile time!

constexpr strings  

Let’s start with constexpr. In C++20, strings and also vectors are constexpr ready.

What’s more, we have even constexpr dynamic memory allocations in C++20.

The dynamic allocation at compile time can occur only in the context of a function execution, and the allocated memory buffer cannot “move” to the runtime. In other words, it’s not “transitive”. I wrote about it in a separate blog post: constexpr Dynamic Memory Allocation, C++20 - C++ Stories

In short, we can try the following code:

#include <string>
#include <iostream>

constexpr std::string str15 {"123456789012345"};
//constexpr std::string str16 {"1234567890123456"}; // doesn't compile

int main() {
    std::cout << str15 << '\n';
}

Run at Compiler Explorer

The above code creates a string using constexpr with 15 characters, and since it fits into an SSO buffer, it doesn’t violate any constexpr requirements. On the other hand, str16 would need a dynamic memory allocation, and thus the compiler reports:

/opt/compiler-explorer/gcc-trunk-20221121/include/c++/13.0.0/bits/allocator.h:195:52: error: 'std::__cxx11::basic_string<char>(((const char*)"1234567890123456"), std::allocator<char>())' is not a constant expression because it refers to a result of 'operator new'
  195 |             return static_cast<_Tp*>(::operator new(__n));
      |                                      ~~~~~~~~~~~~~~^~~~~

Currently (Nov 2022), the libc++ implementation doesn’t seem to compile, so it might have some C++20 issues.

But it’s not all in C++20, as we can do more:

constant initialization  

In C++20, we also have a new keyword, constinit - it forces constant initialization of non-local objects. In short, our object will be initialized at compile time, but we can later change it like a regular global variable.

We can rewrite our previous example to:

#include <string>
#include <iostream>

constinit std::string global {"123456789012345"};

int main() {
    std::cout << global << '\n';
    // but allow to change later...
    global = "abc";
    std::cout << global;
}

See @Compiler Explorer

If you extend the string and add one more letter:

constinit std::string global {"1234567890123456"};

You’ll get the following error:

error: 'constinit' variable 'global' does not have a constant initializer

Summary  

It was a fun experiment! In C++20, you can rely on constant initialization and constexpr strings to check SSO length.

I’m not advocating using global objects, but if you need them, then constexpr might be good. As you can see, if you have short strings, then they can be safely initialized at compile time.

As pointed out by 2kaud in comments, to store constexpr string literals you can also leverage string_view that can hold any length of a string literal:

constexpr std::string_view resName { "a very important resource long name..." };

As a side note:

The other name for this kind of optimization is SBO - Small Buffer Optimization. This can be applied not only to strings but, for example, to objects like std::any or even containers (std::vector by design doesn’t offer this optimization, but we can imagine a similar non-standard container with a small buffer).

References