Date: Tue, 17 Jun 2025 00:01:45 +0100
I'm writing a program at the moment using the wxWidgets library for
showing windows on screen. Some of the constant strings in the program
are very long, and they get copied in a few places because some of the
wxWidgets library functions take a "wxString const &" as an argument.
wxString is similar to std::string in that it provides storage for the
string and allocates memory dynamically if necessary, and so all my
constexpr strings gets copied when I create the wxString object.
For the purposes of this email I'm going to switch over to the C++
standard library, and so I'll talk about the std::string class. Let's
say there's a function in a 3rd party library as follows:
size_t CountWords(string const &s)
{
std::size_t count = 1u;
for ( char const c : s )
{
if ( c == ' ' ) ++count;
}
return count;
}
Now we could argue that this function should really take a 'char*' or
a 'string_view', but let's assume that we can't change a pre-compiled
proprietary library. It takes a 'string' and we just have to deal with
it.
Now let's say we have a semi-long constexpr string as follows:
constexpr char g_poem[] =
"Had I the heavens’ embroidered cloths,\n"
"Enwrought with golden and silver light,\n"
"The blue and the dim and the dark cloths\n"
"Of night and light and the half-light,\n"
"I would spread the cloths under your feet:\n"
"But I, being poor, have only my dreams;\n"
"I have spread my dreams under your feet;\n"
"Tread softly because you tread on my dreams.";
This string is not nearly as long as some of the strings I have
hardcoded into my current program (e.g. the full hex values of a PDF
file for a C++ proposal paper). But let's say that we really don't
want to copy this string. Some how, some way, though, we need to pass
this as a 'std::string const &' without copying the string.
So we start out with a structure that can pretend it's an 'std::string':
template<typename T> requires std::is_same_v< T, std::remove_cvref_t<T> >
struct Pretender {
union {
alignas(T) char unsigned m8[sizeof(T)];
alignas(T) std::uint16_t m16[sizeof(T) / sizeof(std::uint16_t)];
alignas(T) std::uint32_t m32[sizeof(T) / sizeof(std::uint32_t)];
alignas(T) std::uint64_t m64[sizeof(T) / sizeof(std::uint64_t)];
alignas(T) void const *mp[sizeof(T) / sizeof( void* )];
};
Pretender(void) : m8() {} // start off all bits zero
operator T const&(void) const noexcept
{
return *static_cast<T const*>(static_cast<void const*>(this));
}
};
And from there we can write a function as follows to pretend that a
span is a container:
template<typename T> requires std::is_same_v< T, std::remove_cvref_t<T> >
Pretender<T> Pretend(typename T::value_type const *const p,
std::size_t const len);
So then in the implementation of this function, we put specialist code
for 'std::string' as follows for libstdc++:
template<typename T> requires std::is_same_v< T, std::remove_cvref_t<T> >
Pretender<T> Pretend(typename T::value_type const *const p,
std::size_t const len)
{
Pretender<T> retval;
if constexpr ( std::is_same_v<T, std::string> )
{
static_assert( sizeof(std::string) == 32u );
retval.mp [0] = p;
retval.m64[1] = len;
retval.m64[2] = len;
}
return retval;
}
So now let's try it in a GodBolt to see if it works:
https://godbolt.org/z/djWf3q3Gc
This strategy also works for other contiguous containers. Let's say we
have a 3rd party library with an exported function such as:
void Func( std::vector<double> const & );
But all we have is a std::span<double>. Well we can pretend that the
span is a vector:
https://godbolt.org/z/59WfGz3f4
The key here is that the make-believe container must be 'const' --
meaning that you can't edit its elements, can't resize it, and very
importantly, you can't move from it.
Is there any possibility we could have something like this in the
Standard? It can be implemented without an ABI break. I can implement
it in a day or two for all the major C++ standard libraries. . .
Dinkumware, Microsoft, libstdc++, libc++. It's not much work.
In my own program I'll be using this strategy to pretend that a
constexpr char array is a wxString object.
And for those who want to scream at me to say that my implementations
are undefined behaviour, well try to imagine that I compiled it to
assembler, and then I put inline assembler in a C++ file. No more
undefined behaviour. This can be implemented without an ABI break.
showing windows on screen. Some of the constant strings in the program
are very long, and they get copied in a few places because some of the
wxWidgets library functions take a "wxString const &" as an argument.
wxString is similar to std::string in that it provides storage for the
string and allocates memory dynamically if necessary, and so all my
constexpr strings gets copied when I create the wxString object.
For the purposes of this email I'm going to switch over to the C++
standard library, and so I'll talk about the std::string class. Let's
say there's a function in a 3rd party library as follows:
size_t CountWords(string const &s)
{
std::size_t count = 1u;
for ( char const c : s )
{
if ( c == ' ' ) ++count;
}
return count;
}
Now we could argue that this function should really take a 'char*' or
a 'string_view', but let's assume that we can't change a pre-compiled
proprietary library. It takes a 'string' and we just have to deal with
it.
Now let's say we have a semi-long constexpr string as follows:
constexpr char g_poem[] =
"Had I the heavens’ embroidered cloths,\n"
"Enwrought with golden and silver light,\n"
"The blue and the dim and the dark cloths\n"
"Of night and light and the half-light,\n"
"I would spread the cloths under your feet:\n"
"But I, being poor, have only my dreams;\n"
"I have spread my dreams under your feet;\n"
"Tread softly because you tread on my dreams.";
This string is not nearly as long as some of the strings I have
hardcoded into my current program (e.g. the full hex values of a PDF
file for a C++ proposal paper). But let's say that we really don't
want to copy this string. Some how, some way, though, we need to pass
this as a 'std::string const &' without copying the string.
So we start out with a structure that can pretend it's an 'std::string':
template<typename T> requires std::is_same_v< T, std::remove_cvref_t<T> >
struct Pretender {
union {
alignas(T) char unsigned m8[sizeof(T)];
alignas(T) std::uint16_t m16[sizeof(T) / sizeof(std::uint16_t)];
alignas(T) std::uint32_t m32[sizeof(T) / sizeof(std::uint32_t)];
alignas(T) std::uint64_t m64[sizeof(T) / sizeof(std::uint64_t)];
alignas(T) void const *mp[sizeof(T) / sizeof( void* )];
};
Pretender(void) : m8() {} // start off all bits zero
operator T const&(void) const noexcept
{
return *static_cast<T const*>(static_cast<void const*>(this));
}
};
And from there we can write a function as follows to pretend that a
span is a container:
template<typename T> requires std::is_same_v< T, std::remove_cvref_t<T> >
Pretender<T> Pretend(typename T::value_type const *const p,
std::size_t const len);
So then in the implementation of this function, we put specialist code
for 'std::string' as follows for libstdc++:
template<typename T> requires std::is_same_v< T, std::remove_cvref_t<T> >
Pretender<T> Pretend(typename T::value_type const *const p,
std::size_t const len)
{
Pretender<T> retval;
if constexpr ( std::is_same_v<T, std::string> )
{
static_assert( sizeof(std::string) == 32u );
retval.mp [0] = p;
retval.m64[1] = len;
retval.m64[2] = len;
}
return retval;
}
So now let's try it in a GodBolt to see if it works:
https://godbolt.org/z/djWf3q3Gc
This strategy also works for other contiguous containers. Let's say we
have a 3rd party library with an exported function such as:
void Func( std::vector<double> const & );
But all we have is a std::span<double>. Well we can pretend that the
span is a vector:
https://godbolt.org/z/59WfGz3f4
The key here is that the make-believe container must be 'const' --
meaning that you can't edit its elements, can't resize it, and very
importantly, you can't move from it.
Is there any possibility we could have something like this in the
Standard? It can be implemented without an ABI break. I can implement
it in a day or two for all the major C++ standard libraries. . .
Dinkumware, Microsoft, libstdc++, libc++. It's not much work.
In my own program I'll be using this strategy to pretend that a
constexpr char array is a wxString object.
And for those who want to scream at me to say that my implementations
are undefined behaviour, well try to imagine that I compiled it to
assembler, and then I put inline assembler in a C++ file. No more
undefined behaviour. This can be implemented without an ABI break.
Received on 2025-06-16 23:02:03