Note that the existence of a mapping is different from the validity of a UCN

It is an implementation strategy to map characters without representation to nothing.

Other valid strategies would be to use the PUA to represent these characters

To give you an idea of where i want to be, here is a very early draft of what I think phase 1 and 2 should do, pending

a couple of design changes that EWG would have to look at

1. If the physical source character is the Unicode character set, each code point in the source
file is converted to the internal representation of that same code point. Codepoints that
are surrogate codepoints or invalid codepoints are ill-formed.
Otherwise, each abstract character in the source file is mapped in an implementation-
defined manner to a sequence of Unicode codepoint representing the same abstract
character. (introducing new-line characters for end-of-line indicators if necessary).
An implementation may use any internal encoding able to represent uniquely any Uni-
code codepoint. If an abstract character in the source file is not representable in the
Unicode character set, the program is ill-formed.
An implementation supports source files representing a sequence of UTF-8 code units.
Any additional physical source file character sets accepted are implementation-defined.
How the the character set of a source file is determined is implementation-defined.

2. Each implementation-defined line termination sequence of characters is replaced by a
LINE FED character (U+000A). Each instance of a BACKSLASH (\) immediately
followed by a LINE FEED or at the end of a file is deleted, splicing physical source
lines to form logical source lines. Only the last backslash on any physical source line shall
be eligible for being part of such a splice. Except for splices reverted in a raw string literal,

if a splice results in a codepoint sequence that matches the syntax of a universal-character-
name, the behavior is implementation-defined. A source file that is not empty and that does not end
in a LINE FEED, or that ends in a LINE FEED immediately preceded by a BACKSLASH before any such splicing takes place, shall be processed as if an

additional LINE FEED were appended to the file.
Sequences of whitespace codepoints at the end of each line are removed.
Each universal-character-name is replaced by the Unicode codepoint it designates.