On Tue, Jun 21, 2022 at 9:49 PM Tom Honermann <tom@honermann.net> wrote:

On 6/18/22 2:32 PM, Corentin Jabot wrote:

On Sat, Jun 18, 2022, 19:39 Tom Honermann via SG16 <sg16@lists.isocpp.org> wrote:

A draft of proposed SG16 questions for the 2023 C++ Developer Survey is now available here. Anyone with the link should be able to view and comment on the draft. Please feel free to add suggestions, corrections, and other comments.

The list of questions (19 currently) is likely too long and will need to be trimmed. For reference, the 2022 C++ Developer Survey (described as "Lite") had 19 questions.

Thanks Tom,

Yes, the list is pretty long, and remember the survey is biased (a few thousands people among those who follow standardisation closely). The longer the survey, the less participations. I can easily imagine each study group could come up with a long list of questions too, many of which not relevant to all participants.

I agree and I've been a little worried about other SGs jumping on the bad wagon here :)

I guess the essence to what we are going to get to is whether people use or would like to use C++ for text processing. Asking that directly is probably sufficient. Given a fairly low participation rate, letting people write a detailed answer to something like "what would you like to see improved in regard to text processing and localization?" would give us good reply that we could summarize fairly easily.

The last question in the proposed list is intended for that purpose.

I have strong objections to the formulation of question 4, as it isn't possible to use emojis in a conforming implementations.

Historically it has been possible to use some emoji, but yes, we fixed that.

Question 3 is also weird - why these specific languages? It excludes among other languages using Cyrillic, Arabic, Brahmic scripts , so probably around 2 billions people in total and a fair number of C++ developers - although the survey results are likely to be biased towards Europeans and north Americans to begin with.

That is an artifact of me being too quick to get draft questions prepared and being too uninformed about languages used around the world. The Unicode supported scripts list enumerates 159 scripts. I don't have a good sense of which ones should be on this list.

Peter Brett requested this question. Peter, perhaps you have some insight into which languages you feel should be explicitly listed?

In addition to the existing list: Hindi, Bengali, Arabic, Russian. It's far from exhaustive but it covers a large chunk of the global population, without getting technical about which script is derived from which

More importantly, what is the desired outcome of questions 4? C++ support arbitrary characters in comments already, and hopefully no one is considering restrictions.

In some way question 4 is also redundant with question 1.

I think the main desire is just to get some data regarding whether programmers actually use non-basic-characters in identifiers. If many programmers answer yes, that might suggest we should do more analysis to see if the identifier restrictions put into C++23 via P1949 will require some migration assistance. Likewise, if many programmers answer I-didn't-know-that-was-possible, that may suggest a lack of awareness worth trying to address in some way. The survey itself could serve as a way to increase awareness.

Fair enough

If question 1 is going to list EBCDIC, surely it should list shift-jis and gb18030

Yes, thank you, I added those.

What do we want to learn from questions 9 and 14?

Question 9 goes towards motivation for putting normalization form into the type system. E.g., should std::text be parameterized by normalization form.

Question 14 was requested by someone else; I don't recall who. I think the intent is to help gauge whether we can stop treating these types as character types and instead dedicate them for use as small integers ala int8_t and uint8_t. The answer is likely no due to unsigned char being used for UTF-8, but having data would be helpful.

What is the motivation behind asking about collation independently of locale?

It is an opportunity to ask specifically about use of stdcoll and std::collate. That motivation may be too weak to justify the question.

Why not merge 15 and 17?

That might be possible. Question 15 probes what purposes people use the standard locale facilities for. Question 17 probes what facilities people use to actually localize text.

Yup, I think that's not a distinction worth making. If people use both std::locale and icu, they can check 2 boxes.

Tom.

The set of questions was culled from:

Prior discussion on the SG16 mailing list.

Discussion during the 2022-06-08 SG6 telecon.

Tom.

--
SG16 mailing list
SG16@lists.isocpp.org
https://lists.isocpp.org/mailman/listinfo.cgi/sg16