Date: Thu, 07 Nov 2024 20:54:19 +0000
Recently watched a talk about trying to improve the performance of
variants. The code I tend to write doesn't make much use of variants,
optionals or even tuples so it was an interesting listen. While
listening it struck me that the idea of a tagged union is pretty
universal in terms of how most developers talk about writing a variant.
If anything now that I've thought about it, it's so common I'm surprised
the language itself doesn't have a way to define a tagged union. The
compiler could do the heavy lifting here, it really seems like something
I'd expect the compiler to cover. The pattern is fairly common and well
understood hopefully this isn't too taxing. Comparatively dealing with
std::variant means you're locked into an implementation defined memory
layout or representation which might not be what you want, and we're
throwing a lot of language machinery to get it to work as is.
Imagine for example that you could quickly define any tagged union by
specifying as we currently do a list of types to fit in the union and
perhaps a common member acting as the index. That would cover dealing
with aggregates pretty clearly. The only remaining oddity is how to deal
with tagged unions with native types, if at all. Maybe the language
feature's limited to aggregates so the memory layout is very explicit, I
feel like a bit of copy pasting a bit of header structure may defeat the
point, so here's what I'm thinking as a rough example.
enum example_values {
double_type = 0x1,
float_type = 0x2,
int_type = 0x4,
bool_type = 0x8
}
// Pretend we have a keyword for this, although I think we can
accomplish this very easily in other ways.
tagged_union common_types {
double d; // since normal unions don't layout members like structs
it's not so obvious what to do here with native types.
float f; // my potential solution is allow specifying an offset /
expected type as the index
int i;
bool b;
// TODO: specify the member / offset / expected type of the index,
tagged_union_index(8 /* specify the offset like a template */) size_t
index; //odd looking in my opinion, think specifier alignas()
tagged_union_index<8, size_t /* specify the offset and type exactly
like a template */> index; //compiler could scan for a particular
standardized class type like these.
tagged_union_index<8, example_values> index; // passing an enum could
remap the expected indicies the compiler works with for the internal
representation
// The expectation here is that the declaration order matters and
would be 1 -> 1 for an enum member to a tagged union member.
// This would be an example of mapping the states to powers of 2.
// There can be no more than one index, for compatibility with native
types we can imagine that if the specified member name isn't found and
would be
// out of bounds of the type that type is treated as an aggregate with
additional padding to include the member name.
// If an enum or enum class isn't specified the indexs behave like as
in std::variant ranging from [0,<number_of_types>)
// Since part of the point would be to allow the full expressivity of
a tagged union for whomever is defining it may be useful
// to use something like a constexpr std::array where there are many
different states where the patterns are obvious.
};
tagged_union_index could be as simple as:
template<size_t Offset, unsigned_integral_type T>
struct tagged_union_index {
alignas(T) char[Offset]; // Offset must satisfy alignment for T
T value;
}
// TODO: standard library or compiler support for dealing with a tagged
union
The compiler could error out if the structs don't contain the expected
type at the desired offset, this could be ultimately what the tagged
union is after. Where giving a named member would try to verify the
same.
I'm curious to see what anyone else thinks, thoughts?
variants. The code I tend to write doesn't make much use of variants,
optionals or even tuples so it was an interesting listen. While
listening it struck me that the idea of a tagged union is pretty
universal in terms of how most developers talk about writing a variant.
If anything now that I've thought about it, it's so common I'm surprised
the language itself doesn't have a way to define a tagged union. The
compiler could do the heavy lifting here, it really seems like something
I'd expect the compiler to cover. The pattern is fairly common and well
understood hopefully this isn't too taxing. Comparatively dealing with
std::variant means you're locked into an implementation defined memory
layout or representation which might not be what you want, and we're
throwing a lot of language machinery to get it to work as is.
Imagine for example that you could quickly define any tagged union by
specifying as we currently do a list of types to fit in the union and
perhaps a common member acting as the index. That would cover dealing
with aggregates pretty clearly. The only remaining oddity is how to deal
with tagged unions with native types, if at all. Maybe the language
feature's limited to aggregates so the memory layout is very explicit, I
feel like a bit of copy pasting a bit of header structure may defeat the
point, so here's what I'm thinking as a rough example.
enum example_values {
double_type = 0x1,
float_type = 0x2,
int_type = 0x4,
bool_type = 0x8
}
// Pretend we have a keyword for this, although I think we can
accomplish this very easily in other ways.
tagged_union common_types {
double d; // since normal unions don't layout members like structs
it's not so obvious what to do here with native types.
float f; // my potential solution is allow specifying an offset /
expected type as the index
int i;
bool b;
// TODO: specify the member / offset / expected type of the index,
tagged_union_index(8 /* specify the offset like a template */) size_t
index; //odd looking in my opinion, think specifier alignas()
tagged_union_index<8, size_t /* specify the offset and type exactly
like a template */> index; //compiler could scan for a particular
standardized class type like these.
tagged_union_index<8, example_values> index; // passing an enum could
remap the expected indicies the compiler works with for the internal
representation
// The expectation here is that the declaration order matters and
would be 1 -> 1 for an enum member to a tagged union member.
// This would be an example of mapping the states to powers of 2.
// There can be no more than one index, for compatibility with native
types we can imagine that if the specified member name isn't found and
would be
// out of bounds of the type that type is treated as an aggregate with
additional padding to include the member name.
// If an enum or enum class isn't specified the indexs behave like as
in std::variant ranging from [0,<number_of_types>)
// Since part of the point would be to allow the full expressivity of
a tagged union for whomever is defining it may be useful
// to use something like a constexpr std::array where there are many
different states where the patterns are obvious.
};
tagged_union_index could be as simple as:
template<size_t Offset, unsigned_integral_type T>
struct tagged_union_index {
alignas(T) char[Offset]; // Offset must satisfy alignment for T
T value;
}
// TODO: standard library or compiler support for dealing with a tagged
union
The compiler could error out if the structs don't contain the expected
type at the desired offset, this could be ultimately what the tagged
union is after. Where giving a named member would try to verify the
same.
I'm curious to see what anyone else thinks, thoughts?
-- -Alex Anderson
Received on 2024-11-07 20:54:24