> I prefer option 2, but restoring some of the wording your tweak deleted from Hubert's suggestion:

I remove that sentence because we specify in the next paragraph "If a source file is determined to be a UTF-8 source file, then it shall be a well-formed UTF-8 code unit sequence" - I'd rather not repeat thinks multiple times.

The shorter wording in your original message leaves undefined what a "UTF-8 source file" is. The wording I want to see restored provides that definition: it's a physical source file that consists of a sequence of UTF-8 code units.

I'm not bothered by the repetition between the first and second paragraphs; the first says that a UTF-8 source file is a sequence of UTF-8 code units, and the second adds the further constraint that the sequence must be well-formed. That seems reasonable to me.

If you wanted to avoid even that level of repetition, you could change paragraph 2 to say something like

If a source file is determined to be a UTF-8 source file, its code unit sequence shall be well-formed and its content is decoded...
I have a slight preference for the longer version, though, since it makes clear that "well-formed" is referring to the UTF-8 requirements.

