C++ Logo


Advanced search

[SG16-Unicode] Draft Guidelines for papers to be referred to SG16 - for discussion at telecon

From: Steve Downey <sdowney_at_[hidden]>
Date: Tue, 8 Jan 2019 13:29:57 -0500
- Document number: DnnnnR0
- Date: 2019-01-08
- Author: Steve Downey <sdowney2_at_[hidden]>
- Audience: SG16, WG21

<div class="ABSTRACT">
Abstract: Guidelines for when a WG21 proposal should be reviewed by SG16,
the text and Unicode study group.


# Introduction

This paper provides some guidelines for when WG21 papers should be
forwarded to study group 16 for review. The focus of study group 16 is text
processing, with a specific focus on Unicode. Study group 16 will also
review papers for issues with text encoding, text formatting, and IO.

# Unicode Facilities

Any proposal that implements a general purpose Unicode text type, a view on
Unicode text, or implements any of the Unicode standard facilities or
algorithms should of course be forwarded to SG16. SG16 is currently
reviewing proposals for std::text and std::text\_view, so anything with
those names should also be sent to the group.

Any proposal that mentions Unicode may be sent for review, if just to get
clarification of what is meant in context. We currently live in a
multi-character set and encoding world, and in general it is difficult to
require or specify that general text follows a particular encoding. If
existing external standards, such as XML, require Unicode or a particular
encoding, following those standards doesn't need particular review from

Using existing language and library facilities does not require review. For
example, using std::string, std::string\_view, etc. An exception would be
using char16\_t, char32\_t, or char8\_t, but only because those imply, or
should imply, Unicode text.

# Text Encoding

Any proposal that transcodes text from host, source, execution, or other
text encoding, to any of the Unicode text encodings, such as UTF-8, should
be sent to Study Group 16. Any proposal that states that text is encoded in
a particular specified encoding, such as UTF-18, or CP-1252, should be sent
to Study Group 16, where the group can make recommendations about avoiding
that, and the unfortunate reality of supported systems where this can not
be done.

Any proposals for controlling or changing source or execution encoding
should be sent to Study Group 16.

Proposals merely asserting that text is in the execution encoding or
translated from the source encoding as currently specified do not need

Study Group 16 would like to be made aware of proposals using Unicode
encoded literals, but in general would not need to review them.

# Formatting

Study Group has already been involved in reviewing std::fmt, and will
continue as Unicode faciliities are added.

# IO

New text input and output proposals should be referred to Study Group 16 to
the extent that they expect to deal with text encoding, or want to require
a particular encoding. Recent examples include command line arguments,
environment, and debugging data.

Using existing input/output facilities, such as iostreams or C-style IO
does not need review.

Received on 2019-01-08 19:30:11